DOCUMENT RESUME > 


ED 227 181 ; - UD 022 603 

AUTHOR Inbar, Michael 

TITLE Images or Aberrations? Human Judgment and Insight as 
— Reflected In Current Regression Analyses. ‘ 


SPONS AGENCY National Council of Jewish|Women, New York, N.Y. ° 
as Research Inst. for Innovation in Education. 
PUB DATE “Apr 82 
_ ‘NOTE 73p. De 4 
ot PVAILABLE FROM Michael’ Inbar, Department of 
, University, Mount® Scopus, Jer 
(write for price). 


ociology, Hebrew 
alem 91905, Israel 


PUB TYPE ’ Reports - Research/Technical (143) rf 
EDRS PRICE MF01/PCO3 Plus Postage. 
DESCRIPTORS Evaluative Thinking; Higher Education; *Judgment 


Analysis Technique; Multiple Regression Analysis; 
“ Performance Factors; Policy Formation; Predictor 
“ Variables; *Statistical Bias; Student 
Characteristics; *Validity “ 
IDENTIFIERS *Beta Weights; *R2 Values 


, 


In cgntrast to linear models of human judgment 
developed for predictive purposes which are characteristically 
insensitive to the exact values of the weights utilized in them, the 
Linear Multiple Regression Models used for policy capturing are 
assumed to reflect significant aspects of the subjects’ judgmental 
policies. This latter kind of modelling is therefore justified only 
to the extent that its underlying assumption -is found to be true. Its 
‘validity depends, however, on both that of the models' beta weights, 
and of R2 as a twin measure of the subjects' cognitive 
control/consistency and of one's success in capturing the judges' 
policy. Although the problematic nature of beta weights is well 
known, the present study shows that R2 is no less a problematic 
measure. It is shown that the way in which self-insight is typically 
elicited may induce a demand-response effect; fuirthe re, 
‘traditional data’ analysis and comparison methods appear to be 
inconsistent. In the study, eight subjects were presented with 72 
profiles of perspective undergraduate students and were asked to 
judge future academic performance, based on information regarding 
each student's sex, age, ethnic oigin, and socioeconomic and 
educational background. The results indicate that current policy 
capturing research and pangaage caanot be accepted at face’ value. 

. (Author/GC) fs 


ABSTRACT 


a a 


REEKRKEKKEEREKREKREREEKEREEKREEREEREERREKKREREKERKEEERERERRKEEREKEREEREERKKKKKEKK 


" Reproductions supplied by EDRS are the best that can be made * 
* 


from the original document. = 
RAEAKKEERREKEEKREREKREKEEREKKREKRRKEEKKEREEKREKKREEKRERREKEEREEREEKEREKEKEREEREEEKKEK 


& 


. 


Ep227181 


Ut O22 Loy 


é 


_INACES OR ABERRAT LIONS? 


# 


Human Jud ment and Insight as % fleeted 


Im Curr.nt Regression Anmlyses’ 


Michael Inbar 


Department. of Sociolo.,y 


The Hebrew University of Jerusalem 


a 
x, 
ea: | 


Ay 


“PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 


Michael Inbar _ 


TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC).” 


U.S. DEPARTMENT OF EDUCATION 
NATIONAL INSTITUTE OF EDUCATION 
EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 

Ths document has been ‘epioduced as 
received trom the person of organizagon 


onginating tt 
Minor changes have been made to improve 


freproduchon quality 
@ Pots of view oF opimons stated in thes docu 
ment do not necessary represent official NIE 


Cposiion oF policy 


The present study hus beca sponsored by the National Council of Jewish Women 


4 


(NCI) Rescarch Institute for Innovation in !Aucation, under the auspices of the 


varbara and worton waudel Chair in Coynitive Social+Psychology and Lducatiou, 


‘ 


the llebrew University of Jerusalém. The work reported is part of a study of 


hunan Judgimat yenerically labelled Project I0S (iiode ls of Implicit beoliel 


Systens). Requests cor reprints should be addressed to the author, Department 


of Socioloyy, Hebrew ‘University, Mount Scopus, Jerusalem 91905, Israel. 


tf 


ADSTRACT 


™_ 


An contrast, to the linear, »odels of human Jud cent dev-loped for predictive 


purposes Which are characteristically insensitive to the exact values of the 
weichts utilized in them, the Linear Multiple igeresnton (LMK) models used tor 
policy capturing are assumed tu reflect, partly through then, signiffeant 


aspects of the sul jects’ judgwental policies. This latter kind of modeling, be 
' 
it for research, for peovidin, cognitive feed-back, for trainin, or for 


assessing the subjects’ self-inst,ht, is thercfore fustified only to the extent 
that this underlying aeuumption is found to be so. Its validity depends, 
however, on both that of the wodels’ §’s, and of .k° as 4 twin measure of the 

sub jects’ cognitive control/consistency and of one’s success in capturin, the 
judses’ policy. The problematic nature of the beta wei, hts has lon, been known. 


The present study, based on 8 subjects, shows that R2 is no less a problematic 


measure. Noreover, with the data of 4 of these subjects in one case, and with 


that of 7 ia the other, it is shown that the way in which self-insight is typi- 
cally elicited’ may induce a demand-response effect; additionally, the tradi- j 
tional manner of analyzing and comparin,, these data appears to eee Ly a 
_orave inconsistency. Together, these results indte ‘thet current policy- 
capturin, researcli and findings cannot be accepted at Pace value. A list of 

. threats to the validity of these wodels and chetr application is offered. The 
‘likelihood that studies which have disrejarded 'tMeir possible relevance and 

impact on the results obtained may: be reporting misleading findings is stressed. 


In conclusion the dependence of the justification of the policy-capturing 


endeavor on confronting these problems is pointed out. ; 


IMAGES OR ABLRRATIONS? 
Human Judgment and Insight as Reflected 
In Current Regression Analyses 
iy 
INTRODUCTION 
. \ ; 
Back round 


\ 
f 


fodeling hunan judgement by means of linear multiple regressions (LMR) has 


“ become a standard procedure. This technique is common to the tradition of 


resvarch that Hammond and his colleayzues (c.g. Hammond, McClelland and Mumpower, 


_1980) have labelled Social Judgment Theory (SJT), and to that which can be 


traced to liechl’s (1954) work through such studies as those of Dawes (1971), 
Coldberg (1970) and Hof fnan (1960). These traditions of research have produced 
an extensive body of findings which has been surveyed in a number of articles 
ana veut daoladian siovte saa tieantaonteté (1971), Dawes and Corrigan 
(1974), Hanmoud: Séewart, Brehmer and Steinmann (1975) and Brehmer and Hammond 
(1977). A salient theme in this literature as a whole is a keen interest in 


three related issues: capturin, the judges’ policies; determining the degree 


of the judges’ insight into these policies; and comparing the validity of the 
Jud, & \ 


_decisions made by the LMR models of the judges with the validity of the deci- 


sions’ made by the judges themselves. The overall pattern of the findings 
reported is rather consistent. It sugy,ests three main conclusions which can be 
stated and concisely illustrated as follows: 1%) Li4Rs yield efficient models of | 
the judges’ policies (e.g. “a simple linear model will normally permit the 
reproduction of 90-1007 of [the clinical judges’) reliable judgmental variance", 
Goldber,, 1968, p. 491; see also Hoffman, 1968, pp. 59-60; Einhorn, Kleinnuntz 


and Kleinmuntz, 1979, p.468). 2) Judges lack insight into their judgmental 


policies (specifically, "... a number of studies, varying in the number of cues 


ae 


@2< 


that were avallable [have shown that] three cues usually sutfided to account for 
mofa than 80% of the predictable variance in the judges’ response ... One type 


of |arror in self-insight has vwerzed in all of theSe studics. Judges strongly 


siledetinete the importance they place on minor cues (i.c. their subjective 
f | F 

weibhes greatly exceed the computed weights for these cues) and they underesti- 
Ay wr ; 

mate their reliance on a few ma‘or variables.", Slovic and Lichtenstein, 1971, 


g ; see also Hobson, Mendel and Gibson, 1981, pp.181-182, for the same 


conclpsion based on the average use of four rather than three cues). And 3) 

the ndels are usually more valid than the actual judgments from which they 
were. tetne Ly derived (and, thus, "One is left with ‘the conclusion that humans 
may bé fused to generate inference Stategies ae that once the strategy is 
obtained, a ey human shold be removed from the system and replaced by his own 


wteatede!, Gudyeha-aaa Meyiary 1966, pei 187) eadtads hy Galdhere, 1970; . 
& \ \ 


L) 
” 


431). § 


<3 


Thd Be generalizations have been occasionally qualified. ‘Thus, it has been 
\ =< 


shown tha at man can eeepaetica his mode 1 (Libby, 1976a). Man is also able to 
s oul tandpus ly use at least eleven cues under laboratory conditions (Phelps and 
Shanteau, #1978) . A configural model may at times provide a better fit than a° 


linear ond} (iloffman, 1968; Einhorn, 1970, 1971; Libby,“1976b). On the issue of 


* 


a 


insight, Cgok and Stewart (1975) found that statistical and subjective weights 


were in go @. accord, in contradiction to the bulk of previous research (cf. 


a 


Schmitt and 9 pvine 1977, p 16); Schmitt (1978) also found that statistical and 


subjective w dishes correlated highly, although the statistical weights were 
slightly but Ip1gnif icantly superior to the subjective ones ‘as predictors of the ® 


subjects’ jud nents; Gray (1979, p.- 30) using a single cue experimental 


paradigm ca < the judges’ insi;ht was limited. He concludes that his 
findings ape the y. available evidence supports the composed generalization that 


h? 
“people’s effec mevenese in predicting uncertain events exceeds their ability to 


sh 


Horo their peedie tien process”. . 


4 


express Insight , 


OT 


¥ ojo , os 
There have Also been deve Lopments ,fa-the opposite direction, notably the 
hes . . 


2495 


sharpening, of the proposition that man could usefully be replaced by his model. 
The clearest expression of this trend is found in Dawes and Corrigan (1974) and 
Dawes (1979) whg have provided a rationale for usin, “improper” linear models - 


with equal weights. Analytical considerations as well as nearly a decade of. 


empiridal work fuy,est that chis recommendation cannot be lightly~dismissed 
| ‘ 
(Einhorn and ilo,arth, 1975, Dawes, 1979; Camerer, 1981). 


On the whole, then, the three generalizations noted above appear to be well 


substantiated and Widely held (Stovic, Fischhof! and Lichtenstein, 1977; 1 


Hammond, McClelland and Mumpower, 1980, Hogarth, 1980; Shapira,1981); thty are 
moderately qualified, but more in the spirit of settiny their practical linits 
‘ - . 


than of challenging their veracity. 
3 $ . ae 


———ee ee ee 


oy 
x 


The three issues of capturing the judges’. policy, determining their . Gy 
insight, and comparing, the validities of man and his models, are often empiri- | 


cally intertelated. Analytically, however, they ‘are distinguishable. The pre- 


sent research focuses on-the two first issues, and deals only incidentally with 


the third. j ; sf 
~ 


In well known papers, Hoffman (1960, 1968) and Sarlinutad (1968) have 
warned a,ainst the pitfalls of identifying paramorphic models or their parame- " 
ters with the psycholo,ical processes -being modeled. Schmitt and Levine (1977) 
have convincingly reiterated this detec The heart of the methodological 
ar,ument is that currently there is no single statistical index for reliably 
and, therefore, meaningfully measuriny (capturing) pedicives This argument has 


been so compellingly presented, both theoret 


ically and empigically, that the = 
‘ Z 


b 


whe 


question arises as to tie reason(s) behind the continucd use of LiR modi is for 
this purpose, or tor the rolatecd one of providing subjects with a yardstick for 
assessin, their gelf-inetghe. The reason is certainly not theoretical in some 
substantive sense, for the two schools of thought which originated this line of 
reasearch explicitly Aceaved such a goal. Hoffman (1968) already want on 

record some fifteen years ago to express his distrust of the results that para- 
morphic mode ling was documenting. Dawes’ work has brought these reservations 
nd the connection Between substantive itn. nd the issue of statistical. " 
robustness to their logical end. “He vicws line models as variously weighted, 
additive indices, justified by the degreee of their statistical efficiency, but 
not presumed to paramorphically or otherwise model Whatever aspect or level of 
the judgmental process itself tides and Corrigan, 1974; Dawes, 1979). 

Similarly, Hammond and his colleagues (Hammond, McClelland and Mumpower, 1980, 
pp- 61, 71, 105, 136) take great pain to make it clear that SJT has nothing to 
say about human judgment per se. They repeatedly emphasize that. the logic of 
the cognograph and cognitive fecd-back approach rests on what this school of 
thought has been willing to extrapolate from what is in essence a learning | 
rather than a judymental paradigm of peseaseit. This emphasis is in line with 
the fact that the methodological pitfalls noted above @oncern primarily the° 
estimation of the @’s and that Brehmer and his colleagues (Brehmer and , 
Qvarnstrom, 1976; Brehmer, Ha,afors and Johangson, 1980) have shown that cap- 
turing the judges’ policy involves precisely the estimation of these quantities 


Father than that of the le&’s controversial correlation coefficients. The 


fore,zoin, position is also consistent with the prevalent use of the lens model 


equation as reformulated by Tucker (1964) in terms of correlation coefficients 


t 


only. 


Under these condition® of explicit lack. of theoretical ,jrounds or patron- 


aye, the explanation for the continued "mode ling” or capturing of policies and 


a> ? 


the inferences about the judges’ insight that they currently sustaia can only be 
eee the best ‘available clue is perhaps provided by the rationale offered by 
the authors of a recent study. In the process of reviewing some of the major 
weaknesses daharant in LMR modeling, Linhorn, Kleinnuntz and Kleinmuntz (1979, 
pp-467-468) remark that "... indeterminacy in estinating weights when cues are 
correlated (Darlington, 1968), parallels the organism’s difficulty in this 
matter... [and] The inconsistency and random error in judgment, resulting from 
the lack of cognitive control ba exeutnide one’s strategy ... is explicitly 
defined and measured within regression procedures." They then go on suggesting 
that’ when LMRs are siaied in the light of such characteristics, "... they seem 

, neither arbitrary nor ad hoc nor devoid of psychological unten. Furthermore, 
the great success of such models in a wide variety of tasks Strongly suygests 
that some fundamental characteristic of judgment has been captured..." The suc- 
cess alluded to is presumably the remarkable ability of LMRs to explain most of 
the explainable variance of judyes’ responses, an ability which, in contrast to 
the problematic beta weights, has indued remained largely unquestioned. In tia 
perspective, inferences about the judges’ self-insight may be based on the 
discrepency between the small number of cues with which a LMR typically reprodu- 
me a judge’s decision and the more numerous ones that the judges report having 
taken into consideration, ee ae on the problematic comparison of 8 


wei;hts. Under the circumstances, the conclusion will remain unchanged: man 


lacks self-insight. . 


The ‘aim of the present study is to document that to the extent that the 
foregoing line of reasoning serves in this form or some related one as an 
implicit or explicit justification for the continued attempts ‘to capture man’s 


judgmental policies according to established practices, it is problematic in its 


‘ 


4 


~-§= ; ; x 


own richt asswell. Specifically, the number of protiles typically used in LMR 
studies of human Judgment and the fact that policies are not s€atic (Brehner, 
1978; Bucuvalas, 1978), on the one hand, and the manhne +8 which self insights 
about the cues and weights used by. the subjects are elicited, on the other, con- 
bine to undermine and often invalidate such a rationale. The heart of the 
problem is that the pivotal measures of consistency or sal ede, eae (R) and 
the aub jective information collected about subjects’ reliance be cues can be 
shown to be at times artifactual. Pessimism about.man’s cognitive consistency 
and/or insight into his nobiciew ag nonetheless be warranted. The two findings 
just alluded to sug,est, however, that it is unsafe to infer this from current 
LMR analyses --and under most circumstances, neither is it wise to expect these 
analyses ts be able to help remedy. whatever cisaitive shortcomings man 


demonstrably has. ‘ 


A peculiarity of the research to which we now turn to document the two 
results just noted should be pointed out. The findings were accidentally docu- 
mented in a study which addressed difterent issues. Because: by their logic 
these findings are independent of many specific characteristics of a typical 
policy-capturin,, study, they do not require a specially designed study for their 
demonstration. Fort the sake of sidudntenes, they are presented with the data of 


‘ 


the study in which they were oriyinally documented - ng 


THE STUDY 


Overview 


The research under consideration involved eight subjects who acted as -indi- 
vidual judges. This investigation followed a pre-test which was conducted with 
the aim of applying the standard LMR paradigm of analysis to the judgments of 


both individuals and ,roups, the purpose was to investijate mismatches between 


° r / 


g 


‘ 


Je 


certain findings and intuition (c.g. the differential number of cues in the sub- 
jects’ models and in their Bun eeeaecei reports) by mcans of process-tracings. 
As a result of, this pack;Ybund, the study had a complex design with which we. 
need not concern ourselves here. Suffice it to note that each subject performed 
number of jud, mental tasks aver a peribd of about three weeks. The findings and 


%, 
e 
analyses which will be discussed pertain to the first of these judgmental] tasks. 


be 


Sub jects 


Six under,raduates, one accountant and one ND acted as subjects. They were 
recruited through personal connections, and selected after having been made 
aware that the experiment would last several weeks and might at times secm repe- 
titive. The subjects expressed their willingness to fully cooperate and were 
paid over twice the usual hourly rate (a lump sum); tt seems likely that their 
motivation included an element of curiosity and of willingness to help provide 


data for a scientific study. - aj 


the 2¢ah 


—_— 


re 
The judjes were presented with a set of 72 profiles. Each profile, 


alledgedly of a prospective suacueetnaats Student, was to be judged if ‘terms of 
the iikeidhove (0-100) that the quality of the avdeieueadoatic work of this can- 
didate would be compatible with future graduate work. There were 16 cues per 
profile providing the following information about each applicant: sex, age, 
ethnic origin, 1.Q., high school graduation grade, socio-economic background, 
marital: status, health, ‘achievement expectations, nature of relations with high 
school teachers, time spent doin; homework during last year of high school, fai 
of bettie: living expenses arran,ements, political activities, social connec- 
tions with university staff members, and sociability. Some of the cues were 


jiven quantitative values (e-y., age), others were described by quasi-interval 


fm 
= 


= Ad 


labels (e+ -, no, Some or intimate social relations with university sear 

members). The cues were moderately interrelated, td avera,e of the absolute 

value of ,ptheir intercorrelations bein, .132; the correlations ranged from 71 
a 


to +. with the bulk of the. (112 out of 120) ranginy, from -.31 to +.24.— 


Procedure 


The task was individually explained during a-practice session with 3-4 
protiles. It was then handed out to each subject to be performed at home; the 
completed assi.nnent was typically returned within 48 hours. Upon completion of 


the experiment as a whole each subject was individually debriefed. 


Results 


Table 1 presents the beta weiyhts and the multiple correlations pertainin; 


to the equations of the eight subjects. The alternating rows, A, B, and C give 


“the results of three possible- modeling decision-rules: inclusion of the 


variables with a beta weight significant at the .05 level or better only (A); 
y ; : 
inclusion of all the variables which contribute at least 1% of explained 


variance \to.the equation (B); ihelusion of all the variables which ‘contribute 


any measurable amount of explained variance to,the equation (C). A shared 


constraint is\that the overall R* of each equation be significant at the ~05 


’ 


level or better 


eal 
Peorg 
's! 
iB) 
iG! 
ie 
Yet 
| re! 
aioot 
try! 


Choice of uquatioa ; 

Each of these decision rules (selected for purposes which will become clear 
as the sans proceeds) ean be criticized.! Thr atu here is hot to repeat 
ar,uments already made (Darlington, 1968) but to introduce the main discussion 
hy illustrating with the snawak data the kind ot differences which can result 
Yrow making one choice rather than another. Thus, in the case at hand, the 
average number of cues utilized which obtains for the three decision rules 
AB and C; ia.S 7 sai 14.7 respectively. a sub ject sad Wleviksee arbitrarily 
end up being cateyorized as ietieetig relatively few or many cues, depending on 
the equation chosen; moreover, the beta weights which presumably capture his 
policy correspondingly change. These results highlight the fact that in any 
attempt to capture the policy of ‘a judge one difficulty revolves around the lack 
of objective criterion for selecting the equation which presumably best descri- 
bes this policy. Note that ce it is now recognized that the sienutee 


multiple regression procedure recommanded by Darlington (1968) will often yield 


. 


‘results which -coupound the problems attached to the interpretation gf the beta 


weights (Gordon, 1968; Cohen and Cohen, 1975), this procedure is not by itself 


an.acceptable solution. By the same token, but more generally, any procedure 


“relying on the amount of explained variance for selecting without additional 


rationale or safeguard the most appropriate equation, and by implication the 

most ieaceineiaacteee weights, is duestianatile. The Saker is that this cri- 
terion will Geuniigetend to the indiscriminate choice of the equation with the |! 
greatest nunber of variables, wiiabhee vee are relevant or not. This follows 


from the fact that tn a multiple regression the addition of a variable, even if 


“redundant or irrelevant, can never reduce the amount of variance explained; if 


the variable is utterly redundant it will have no effect; if it is utterly 


irrelevant, it may have no effect, but often will adda quantum of explained 


variance, however minute, to the equation owing to chance relationships; one of 


| Ase 


6 


a 


“unnecessary variables inekis sense. Table.1 illustrates the ef fect iveness of 


‘ an undent able appeal,’ has the eubataavixe disadvantage that all stepwise proce- 


followed, it underscores an often overlooked characteristic of multiple correla- 


. tion coefficients. a ocaeesines the use of ad justed R2’s makes: salient’ the 


_ more sii tien is commonly ohiibeds ” 


the situations noted in footnote 1 can then arise. when he sauple size" is held 


ésnetant, the purpose of the adjusted r2 (which, as far as ie is concerned, may 


. decrease; sec discussion below) is sieconeis to correct for \the inclusion of 


this correction. “Although the increase in R2 between equations A and C is on“, 


the average +5%, the corresponding difference between ad justed R2’5, is negative 


‘, 


(-.002); this trend is even accentuated when equations 3 and C are’ compared. 


a“ 
The misguided (and unpars imonious). strateyy of including all the variables which 


> 


contribute any n@asurable amount of explained variance in an equation has there- 


fore been properly identified by the values of the adjusted R2’s. 
: N at - , 
. og j 7 a 
,Along such a line of reasoning, At could be argued that the mst . : 


- appropriate equation for describing the policy of a subject should be selected, | 


on the basis of the latgest adjusted 22. This is in fact the logic of the ; F 


ebeaten? advocated by Wonnacott and Wonnacott (1979). This suggestion which has 


dures ‘share (Gordon, 1963) and that the application under consideration does not 


avold (but could minimize, a point to which.we shall return). 


The-issue of immediate interest, however, is that when such a strateyy is 


™ \ 


are ae hae 


fact that the difficulties ‘tavolved in PapEUEAnE the policy of a judge are a a 


‘e 


The Notion of Cognitive: Control .. | gt ow ip 


’ 
ole 7 : ~~ 


To put the foregoing in a concrete context, consider the observation that 


the polictes of judges chanje during task performance (Brehmer, 1978; Bucuvalas, 1978) +. 


a . 3 ‘ 
. * ‘ . . . 


4 


“11-2 


A reasonable question is to ask whether the change not attributable to unre- 
liability takes the form of ad hoe applications of procedures as the need 
arises (Brehner and Xuylenstierna, 1980) or of a more systematic’change of policy 


over time, perhaps as a result of the processes of--chunkin, and habicu&dtion. 


A simple way to begin the investigation of this question is to split a 
Sample of juduaente according to pines sequential order. If there should be a 
systematic over time change in potter and if the split is adequately made, the: 
sudebinua Meum within each fee subsample should exhibit an improved “fit 
over that found in the overall equation, within limits of sanpi@e fluctuations. 
‘Operationally, CUPEE POR, one would expect the individual], and in any:case, the 

. average ‘of the R2"s of tha’ equations deve loped within the araueeiy split sub-_ 
sequences of judgments tobe iLreater than the R2’s- of the equations developed on 
the whole sequence. . Saneeeeiy, if no systenatic chanje in policy takes 


place over tine,» no such expectation should be entertained. 


j a 
The simplest possible sequential split is to separate the judgments into 


two equal groups, in our case two subsamples of 36 profiles, according to the 


order in which they were processed. If there should be in the presen’ task only 


one al ad change in eeried: and if it should typically take place abodt half way 
“ during ‘tank execution Coa, process” eatin data Suz gests this might be roughly 
the case), this procedure’, admittedly a ;ross approximation, “should nonethe ess 


2 . ; ¥ 
help cast some lizht on the nature of policy change and policy routinization 


over time. ¢ 


Table 2 presents the equations of the subjects deve loped fin. auch a manner. 
In the interest of space, ‘only one class of equations is presented. The 
equations are those which correspond to the decision rule which yields the 
highest ai itad in Table 1, decision rule B« The findings in Table 2 and 


the discussion which follows apply equally, ‘however, in the case of the omitted 


t 
~ 


- gi o> ie gpese EM 


te: 


Vay 


Whole object sample (sec Table 2). 


& 
equations. 


Insert Table 2 About liere 


‘ 
at 8 


putas haw ¢ "Fs d ak i 
In terms of their #’s, the twin equations are clearly very different from 


one another; similarly, they differ very much from the comparable equation 
developed on the whole sample for the same subject. Note, in particular, the 


not uncommon shift of variables, as we}l as change in signs, which occurs bLet- 


ween the two sequential subsets of judjments. 


In the “light of the unstability of the beta weights noted earlier, 
. ; oe 
including, as we have just seen, in the case of relatively slight variations of 
iebaneted of the same eguation (see Table 1), these results are neither 
surprisin,, nor necessarily indicative of any subi elubive pedeaees . 

The va lues of the n2"5 clearly suggest, hmeuee, that the policies during 
the first and the gacnin halt of the judgnental Baek Were dtatinctiy different. 
In terms of summary masures, the average of the n2’s5 in the two object sib 
sanples is .78, as compared to .66 in the case of the parallel coefficient for 


the sin,le Cyuations of Table 1; moreover, in every single case the forner 


averave is greater than the r2 of the correspondin; equation developed on the 


° 


These values could be misleading; this is not unlikely owiny to the con- 
bined effect of sample size and number of predictors in the. new equations. The 


je . ‘ 
adjusted n2"gs which correct for these parameters (see Table 2, column 20) 


— Sugyest, however, that a is not the case. Although the adjusted values are 


noticeably reduced, the flinding remains unchanyed; the averages of the adjusted 


values of R2’s correspondtn; to those in the previous paragraph are, indeed, -70 


‘ , 
’ 


~ 


and .62 respectively. 


This finding has clearly potential implications for work on the modelin, 
of policivs and for the determination of the juc.es’ ‘ia Ae tae them. Because 
of the twportauce of these implications, it is prudent to dotible check the 
results. One way to accomplish this is to compare the results just obtained 
with those produced by splitting up the samples randomly. Table 3 presents the 
summary results of this analysis on subsamples divided by the odd even method; 


the relevant data Cron Tables 1 and 2 are included for cohparative/ purposes. 


€ 


4 


Insert Table 3 About ere 


Two results — of interest. The first is that by this standard as well, 
the evidence is that the subjects systematically used ree policies durin 
the first and second hait of the task. The averages R2’s in the sequential sub- 
samples is .78 versus «73 in the case of the random split subsamples; the 
ad justed R2, -70 versus .66, respectively, ae. these results (see the 


two penultimate rows at the bottom of Table 3).2 


The sccond is that the averaye across. sublects b£ the adjusted R2’s for the 
randomly split subsamples (.66) is greater than the corresponding avera,e for 
the single equat ions(.62). This result which holds ayverenatienliy true within 
subjects as well .(with one exception, signee pe sec Table 3, columns I-2 
and 111-2), makes salicat the often disregarded fact that the adjusted R2’s may 


fail to adequately correct for variations in number of predictors and sample 


size. It is instructive to take ‘a closer Took at the reason for this failure. 


=14- 


It’s source can he traced to the nature of the foraula that is assumed to 
correct for variant (ane in the two foreyotny parameters (ie. to correct for 
“shrinkage "). This anal which adjusts for degrees of treedoa has several 
related forws tiie, Cohen aud Cohen, 1975, pp. 106-107. Nie ct al, 1°75, yp 355; 

» 


Green and Tull, 1970, p.351). Secause of the transparence of its structure, 


consider the form found in Wonnacott and Wonnacott (1979, p.181): 


, Ls (1) 
* A) 
where R2 = adjusted squared multiple correlation, k2 = obtained squared multiple 
correlation, k = number of predictors in the equation, and n = sample size. For 


instance, equation I for subject 1 in Table 2 yields a R2 of .82; with k=9 and 
* 
n=36, the corresponding R2 is accordingly, 


é ; 3 | 
52 c— 3 , -_ — Se ee ae ee 
5S [ “Be 46 3 1 | [ ee a) | 

= .76 Ps (2). 


'* The logic of this adjustment becomes evident if we note that the 


‘4 


expected R2 (R2) in a multiple regression where none of the predictors is 


. Hersey related to the dependent variable, i.e., where the true value of R2 = 0, 
will nonetheless be equal on the average to 
n-1 ee <>: 
This follows from the fact that one can get a perfect fit to n data points using 
. n-l different predictors, indepetdently of any other konsideration (cf. Green 


and Tull, 1970, p- 351). Thus, with k=9 and a sample of n=10, the expected 


’ ia r 
. ‘ é ’ © 


“15 


n2 is 1.0, even though the actual relationshtp wy be 0. The tirst parenthesis 
on the right hand side of equatioh (1) corrects for this pened kan of ke by 
substractin, fron it the quantity k/(n-1l). Tf the predictors do bear sore 
substantiv. relationship to tle lependent varielle, this adiustment fs overdone, 
however. The reason can be aeen by assuming that the crttcrion is eetuatly weno 
feetly related to tie predictors, that is, R2 = 1. The first parenthesis on the 


right side of equation (1) then yields the value: : 


3 vty | - [eid - —, | 
P noe oh? n-1 ne-l _ 


neds 
= n-1l |. 4) 
which is necessarily smaller than one, while by hypothesis R2=1. Under \hese 
: * 
conditions, we would nonetheless like’ equation (1) to yield the valu =1. To 


insure that this is the case, the quantity (4) must be appropriately ad justed; 


This can be achieved by multiplying it by its inverse --the operation that the 


second term on the ri,ht hand side of equation (1) performs. ? g 


The correction that equation (1) achicves is, however, approximate, for it 
is not possible to determine exactly the degree of overestimation of R (cf. 


Kerlinger,,and Pedhazur, 1973, p. 2&2). The approximate nature of the procedure 


“WE ‘ns k : 
is probably best illustrated a notin, that R2 can be negative in which case it 
is by convention reported as O (Cohen and Cohen, 1975, pp. 106-107). For 
instance, to use these authors’ example, for R2 = 10, k=11 ahd n@=100, cquation 
(1) gives R2 = -.0125. 


The crux of the matter, then, is that equation (1) is only an estimate of 
the likely value of k2 in: the population. It gives a useful indication of the 


probable cffect of cross-validation on any given R2 as a function of the degrees 


. 


: 18 
4 = ~ 
fala! my . ca > 


& 
i, . “16- 
‘of freedor rrr ete it wb estimated. (Vowever, as the data in Zable 3 We 
illustrate, this confection is tasufficient in the case of a stepwise analysis, 
en j 
the reason is that this kind ot analysis affects. the degrees ot Lreedom by 
surteyptittously fLacreasing the aumber of .k’s bivolvcd ia the evaluation proce 
durc, a difficulty which has led to the sug,estion of various heuristic safe- 
guards (Kerlinger and Pedhazur, 1973, pp. 282-283, Cohen and Cohen, 1975, p. 
107; Wonnacott und Wonnacott, 1979, pp. 186-137). There ig no evidence, 
however, of Cheir bein, used in research on hunan judgment, despite the fact © 
that the cautious use of tepwise. multiple regressions does present advantages, 
and the observation that, ether erteicaliy aaat tad or not, this procedure is 


“commonly used for developir the wodels of the judges. Even more importantly, 


there appears to be a compa rmentalization regarding the use of adjusted and 
. 3 * 
unadSusted squared multiple correlations. While r2 is increasingly reported 
in recent research, this is done as an indication of the likely effect of cross- 
validation on R2, rather than for the purpose of better assessing cognitive 


control. Indeed, the central, and in many studies the only, measure of the con- 


cepts of co ynitive control and consistency remains the uncorrected r2, 2 
This brings us to’ the heart of our present concern. 


One important implication of the forezgoin, elAboration is that the nature 


of R2 hi,hlizhts the fact that in its unadjusteyY form the magnitude of R° is ia 


part a direct function of the values of k and na An often overlooked conse- 
j ‘ ke | 
quence of ‘this relationship between R2 (and, as just noted R2 in stepwise : 


analysis) and these parameters, is that the pitfalls attached to the direct 
interpretation of the explained variance as a measure of the strength of a rela- 


tionship are not without resewbliny these found in the case of y2° Indeed, both 


types of mibateutetlect not only the strength of 3 relationship, but also the 


size of’ the samplé involved in estimating it. -In the case of 2, the 


Y 


ee 


Ps | 7 


lar,«wr the sanple, the Vreater the apparent relationship, while for multiple 
.e 


’ 
. 


correlations, the larger the sawple, the smaller {f appears to be, other .things 
a a 


hein, equal. (Incidentally, it is of {nterest to note that an informal survey 


cs 
é 


shows that sophisticated rescarcuers quite fasiliar with LAR techniques eit to 
‘have mistaken intuitions about the nature, and direction of this effect of sample 
size on n2y, Another, more frportant difference is that the effect~o& sample 

size on the wltiple correlation coefficient is for all eset teat purposes ‘a 
‘bounded. As the sainple size tncreases, the cxpected shrinka,e of this cocf- 
Hicibie for any given number of predictors diminishes in direct pinkion to 


k/(n-1), while in the case of 9 the sample size’s effect remains undamped. 


.Let us now refocus our attention on Table 3. 


The dependence of the magnitude of R2 on the values of k and n and the’ fact. 


4 


that for small ratios of k/(n-1) (i-e. for few predictors and large samples) the 
effect of these parameters may become er ee ee that the kind of fin-_ 
dings reported ia Table 3 ii to be interpreted in the light of the answers to 
two questions. The first af gether the artifactual effects {llustrated in this 
Table are likely to be a © in LMR research on hunan Midueeubs The second 


concerns the practical implications of these artifacts. 


* 
Because cyuation (1) shows that R2 is a function of k, n,and R2,, the answer 


to the first question depends on the magnitude of these quantities in empirical 
research. One estimate (llammond, McClelland and Mumpower, 1980, pp- 132, 197) eS 7 


is that the typical values of k.and n lie, respectively, between 5 and 8, and 20 


and 50. With reyard. to R2, Canerer eet) found; that the average ¥ R2 in 13 
studies was .74; Shapira’ 8 (1981) survey of 22 edi different) studies 


yields the value.78.. Slovic and Lichtenstein (1971) provide a more differen- 
: . e 


~ee 


tiated estimate. They note that the / R25 they exanined were in the .70’s for 


complex, real-life Judjments, while they were in the .8C’s and .90’s for the 


more artificial, laboratory-type judymental tasks. ‘he followin, discussion 
r 
inte, rates these estimates of the sizks of / R25 by preserving the distinction 


that Slovic and Lichtenstein made on the basis of their detailed review of the . 


literature. 


With this in wind, Table 3 justifies two conclusions. The first is that 
the trends documented onthe basis of .the subsamples of n#36 each, and the 
avera,;e number of cues in the equations developed on then of 6.6 (see column ‘I, 


3), are unlikely to be atypical. The second, is that owing to the size of the 


R2"5 Sn = Hs = .85, on the average, see table 3,- column I,1), the 
magnitude of the artifacts is probably more representative of that found in 
laboratory-type research, than in studies involving complex, real-life judgmen- 
_ tasks or issues. Because the ma, nitude of ene error is an inverse function 


of that of R2, the size of the artifact will be greater in\the latter case. The 


x 


extent of the expected difference is illustrated in Table 4, 
\ 
*. \ s 
& , \ bd 
ET eT COE IN, eT 
"ah: Insert Table 4 About here_ | 
" 3 v “ : : 
For thegsake of legibility, the relevant data have been organized into con- 
* * 


secutive submatrices. The endive of these submatrices, R2 = 1.30 to R2 =.80, 


give the “true” values of explained variance that the selected eer of the 


‘te, 
obtained R25 listed in. che corresponding ‘submatrices a by shplication o oF 
: 
equation (1) or, equivalently, the values that the R2°s listed id the sub- . 
* 
natrices yield by a reverse Petree of equation (1) (i.e. R2 | given and 


idk 
i 11 . 


& 


-19- 


Ke unknown) for combinatious of values of k and a. For Rdieiners Hw first 
entry in the first submatrix of Table 4 indicates that a n2 Of \48 obtained when 
k-and un were 5 and 20; respectively; is likely to be in fact .30; conversely, a 
“true” squarod mult iy de correlation of .30 is lil: ly to.have a value ‘of Ao if 
i is estinated with 5 predictors on a sample of 20 profiles and, looking at the 
first entry of the last a of the same submatrix, a value of .37 if it f% esti- 


mated with k = 5 and n=5C: ,the discussion is notional and assumes that the 


* 
values of R2 in the headings of the submatrices are not biased by a stepwise 


procedure of estimation of the R29. 


If for the purpose of clarity we trade precision for simplicity, and take 


en sg oe a Js 


oe | 
J R2 = .90 asa point estimate to represent 


' 


the typical range of ‘values found 


! 


in laboratory-type judgmental tasks and v rR2 = .75 to represent that found in 
the more complex, real-life ones, the trends in Table 4 together with the main 


‘point made in the fore,oing analysis, lead to the following conclusions. 


F . e 
Firstly, the effect of object sample size is greater across levels of k’s 


than is that of the number of predictors across levels of n’s. That is to say, 


in the range of values of k and n under consideration, a change in the size of 


’ 


, the object sample tends to be more consequential than a change in the number of 
~ 3 ; 
cues, whatever the level of cognitive control considered. . 


my 
ohn . ‘ 


Me q Le Legh Ir 

Secondly, fot,.the laboratory-type tasks in which R2 =.81 ( R2 = .90), 
and up, the majnitude of the artifact that relatin, undifferentially to k in'the 
rane 5 to 8 and to n in the range 20 to 50, may introduce in estimating a sub- 


Jeet’s coyuitive control, or in comparing the findings produced by different - 


studies, is on the whole relatively small. ‘The, last. submatrix (with values of 


s 


‘ Q 3 


i ealetay, from .02 to .8&, and n2 =.C) shows, indeed, that when coynitive 
control reaches such a level, the maximum fluctuation in explained variance is 
3” of explained variance: whoa % rather than 5 cucs (or vice versa) are usec in a 
model, 5% when the object sacple size changes fron 20 to 50 (or vice versa) and 


4% when both changes occur concurrently and additively (seé the right hand 


a 


x 
diagonal of the submatrix under discussion). ‘hile these maximal values are not 


insi nificant, the lesser wa nitude of the other pussible vartations (some of 
which are 0 because of roundin;, necessities) may be reparded by the criterion 
suzsested earlier as being of a mayjnitude where the advisability or not of 


distinguishing between discrininability and substantive significance is a. matter 


of opinigny 


Thirdly, for tasks {n which subjects typically exhibit a ¥ R2 _in the 


-70’s, that Ls, to focus the discussion, where R2 = .56 ( R2 = -75), Table 4 


“eos b : | 
shows that an empirical value of this magnitude. is compatible with a true coef- 


as * * 
' fictene of cognitive control ran,ing frou R2 =.30 to R2=.50. The submatrix 
. z: ‘ 


k 5 
headed by R2 =.50 which best, and most conservatively, approximates the distri- 


_ bution of R2°5 having the notional value of .56 of interest, indicates, ij 

moreover, that in this case the na xi nun fluctuation in explained variance Bas 
(as compared to 3% in the previous case) when k varies from 5 to 8, 13% (versus 
5%) when the object sanple size varies from 20 to 50, and 16% (versus 6%) when 


- ~ 


both chanjes occur. concurrently and additively. 
‘ 


It is: probably noncont roversial to state that in this case neither the 
maxinum potentigl magnitudes of the art {factual component of R2, nor several of 


the lesser valuce it can have, can be safely disregarded; nor can the wide 


ook; : 


range of imprecision (.20 of “true” explained variance) regarding the magnitude 


Ce ee: See. 
ae a 


* 


’ 


-21< 


Na 
¢ 


of the underlying coefticients of actual coynitive control (in this connection, 
another aspect of the effect of the size of R2 tay be noted; if we adhere 


strictly to the notional valuc of .81 discussed earlier, Table 4 shows that it 
. * 
is found ia one submatrix only, that headed by the value n2 = .70. That is to 


. 


say, in the framework of the yross cate,ories of Table 4, the imprecision 
shrinks in this case to 0, which by comparison with the previous value of .20 
underscores the effect of. the level of coynitive control on the pitfalls 


attached to the unguarded measurement of this concept). 


” Practically speakin,, the seriousness of the foregoin, artifacts depends on 
the ma, nitudes we have just docunented; their consequentiality also deiendte. 
however, on the oanter in which the artifacts tend to come about in actual 
research. That is to say, to assess their actual implications it is also 
necessary to have an idea of the conditions under whieh the quantities which 


determine the size of the artifactual component of R2, namely k, n, and the size 


of k2 itself, vary in empirical research in a potentially damaging fashion. 


¢ 


"7 
ig 


‘ 
. 


Consider first k. There seen to be at least two main ways in which the 
values of this paraweter can underzo chanzes conducive to misleading inferences. 
The first is common in the situation where two equations developed for arene 

‘ 


judges on identical profiles (i.c. with the same set of supplied cues) and ‘on an 


object sample of identical size, are directly compared. Under these circumstan- 
\ 


‘ces, two subjects with identical true scores of cognitive control could, nay, 


are likely to end up being catezorized as havin, different degrees of cognitive 


consistency, mcrely because their policies might involve a different number of 
variables, i.e. might require for their expression a different number of predic- 
tors in each equation. Similarly, the same subject studied on the same number 


* 


of cases with profiles involving the’same number of. cues, but about a different 


‘ 


-22< 


real-life substantive- issue, could end upwith different scores of coghitive 
r} a’ ’ 


consistency simply because of the nuwber of cues he might happen to need to 
express -- exactly as well -- each of. his policies. The second way in which k 
can chanye with coukicadbtig conscquences is less insidious. The likely 
occurrence can be illustrated with a hypothetical study of transfer and genera- 
lization of the effect of cognitive feed-back that one ‘nay bé tempted to carry 
out. In such a study, it could appear useful to: desijn the ers 
a different number of cues. If this second task should include more variables, 
and under the assuuption of a monotonic relationship between number of cues pro- 
vided and number ‘of cues used in the {ua umental task,.a training session of this 
kind could be expected td produce a ,ain in cognitive consistency, if for | 
nothing sive, because of the direct relationship between the size of the arti- 


factual component of R2 and the number of predictors in a model. 

Consider now sample size. We have seen above that the confounding effect of 
this parameter is ,reater than is the case fér k. The pertinence, arbitrariness 
or accidental nature of the considerations which lead to the determination of 
the object sample size come also mote readily to mind in this case owing to our 


sensitization to Sie teaue of sample size in general. Thus, a probably shared = 
experience is that these Ponesterak cone include primarily the time available for 
" the experiment’ ‘a typical value. bein; one Si with students fulfilling a course 
reyuirement --with sometimes a follow up session of one more hour, often used 
for validatin; purposes and debrief ing. When research money is available, the 
limiting consideration appears to be the anticipated, information-processing 
capability of the wid subjects; sessions are then more likely to extend to 1Y2 
or 2 hours, with as many additional sessions as necessary. For real life tasks 
the decisive factor is commonly the anticipated cooperation of the ptoupwckive 


Judges -- which may in turn be a function of the social relations and/or the 


rapport of the researchers with them. That is to say, depending on the means 


ve 25 : i ; : 


~ 


¢ 


available to a researcher and to his perception of the patience of his subjects, 
f) $ i] 


the measured coyjnitive consistency of a judge ‘can, typically vary by as much as 
is made statistically possible by halving the size of an object sample, and even 


more than that if the combinations of sample size,and of later validation by 


t 


the split-half method are taken into account. On the whole, the Sone of 
sample sizes of n#=20 to n#=50 may express the manner in which the considerations 


and constraints just noted lead to the typical object sample sizes found in the 


literature. 


Po 


Be this as it may, when the artifactual effect of sample sf%e runs the risk 


. 


of reachin, the levels illustrated in Table 4, and when some of the controlling 


factors of this risk can be as irrelevant to the’ subjects’ actual cognitive 


- consistency as those we have just noted, it is clear that comparing levels of 


, cognitive control across models without ascertaining the equality of the object 


sample sizes PS eich they were developed can be hazardous. *~ As Table 4 shows, ) 
this hazard grows in direct proportion to the difference in sample sizes. It re 
noteworthy, hawever that irrespective of the exact difference between n’s, the, 
probability of makin, misleading onpavtsons is stave facilitated under the 

present ear cune tances by the fact that the aan le eize artifact sparakee in oye 

\same direction as does a seemingfy compelliny ieovankelons On stagisticap,.. 

etait: the greater the object oonpne size, the smaller R* is expected Eo"kn: 
similarly, from a substantive perepnebines the greater the ebject sample size, 

the lower the degree of cognitive senered.) one expects from the judges, and hence 

the smaller the intuitively expected R2. A decrease in the size of R2 chewed: 


in the context of a longer task i8 therefore likely to Be interpreted as a 


decrease 8 the subject’ 8 cognitive control, despite the fact that part or all - 


of such an “ef fect isa necessary statistical qabcorihiy 


* 
ie r+ 3 In short, not only are the artifactual values of R2 ‘we can expect. to find 
Ly odin LMR ‘research on‘human judgment likely to be’ at times of a magnitude we cannot : 


- safely disre,ard, but the manner by ‘which| these artifacts tend to come about 


Suggests that they could be widespread. 


e ’ 


Let us now summarize the main points of the foregoing discussion. 


i a o & 
(x . The pivotal obgervation is that the notion of cognitive control or con- 


sistency as measured by R2 is ambiguous and problematic to an unexpected degree. 
. : . . . . ; wae ; v ’ ; 
fee SE For the typical range of values of k and n found in LMR research on human “judg- 


ment, this measure is artifactually affected to a significant degree by 


-+ 


Ys . | varyitions’ in the sesuek of these parameters and of the obtained R2’s. Table 3 


. ~,-% 


“yactuens yal versus III, 1) concretely illustrates the aPEREEISS of these con-_ 
. i 


founding effects which range in this case from. .0% to 21% of explained variance. “ 
; " : “7 ” 4 ‘ ? 
jp . . The analYses based on Table 4 show that ee values are recognizébly close to 


theoretical expectations. Table 4 also permits one to phrase the problem dif- ° } 

| ferently. Thus, this Table indicates that stating that one subject exhibited a. 

: : : * 
degree of i cian control of, say, R2 = .58 while that of another one was R2 = 


71, may simply convey information about the object sanple sizes used in the two 
j cantons, ‘On the other hand, stating that two auniecks have the same degree of 
cognitive Scaaiteeua of, say R2 = ely may conceal the fact that despite the 


' 


‘gprentity of ‘this measure and the fact that the two ‘judges were studied under 


1-H Leal -condtone-and-go-the-sanetask (0420; 128) they actually have dif- 
x * ‘ -~ 
ferent true scores of cognitive scaiteul a =.50 and R2 =.60, which 


* cme eiohenerne mien pueyrte ryt sete 


ihe rtarensbeese hire 


Ch eennnnenmnt Fas late~ into anebravenesaa ‘of =.21% and ~. 11% ‘espectively, with the epanitbed 


‘ 


: scores, a fact which is hidden by their different policies, involving in one 


¢ 


case 8 cues and in the other 5. 


— @25< 


‘ 


These’ problews of interpretation of R2 as a wasure of cognitive control 


ar: compounded by the fact that Tables 2 and 3 suy est that subleets may be 


chan ia,, their polici s ow r tiv ia a systewatic vay. This potuts to the 
impogtauce of detersinine the number of profiles which constitute a natural sub- 
- set ov block for capturin,, the policies between chan es -~an endeavor which 
‘ 
‘could turn out to be idiosyncratic for at least sone combinations of tasks by 
. ’ 
subjects. Because the range of typical sample sizes may involve one or more such 


natural blocks, the majnitude of x2 is also likely to reflect -the chance 


overlap, or lack thereof, between appropriately- determined subsamples in this 


® 


psycholo,ical sense, and the actual set of profiles on which a model happens, to 


° 


have beew developed. The ad the over timechan,e combines with k,n, and the 


levels of obtained 22 An at Pectin the artifactual component of this coefficient 
: is a topic which ceserves an analysis in its own right.. Presently, ‘it suffices * “ge 


‘to note that the Rarger the object sample, the ,reater the likelihood that Re es 


will ‘aie be reduced on this account as a result of the wistive of gabe ina 
single equation; ‘note the implication that the dynamism of psychological pro- 
cesses and the statistical requirewent of large n’s may be’ working at cross pur- 


poses for the needs of wodeling. 


° 


When we consider the effeet of all the foreyoiny factors, either indivi- 


dually or in combination, the question cvidtntly arises of the meaning and use- 


fulness of R@ as a weasure of coynit ive control, both to the researcher and to 
sath ne ween TUE IH iafrow rane “of ‘values near the upper limit of its 


‘‘ size, the absolute aad of this cout ficient, as well as. the possible chan,es 


observed Ao_£t8. sie eos manifestly “ant y Guous to “the point “where its” 


Se ined een eee 


interpretation for either theoretical or applied pagpoies is detieenncee , - 


, 
* Z 


pellin,,. — i 


-26- 


‘ . ' ¢c 
The analysis has assumed all along that there is some underlying human .char- 
. . *. ° ° 


- acteristic operationalized by R2 which represents the subjects’ true cognitive 


control and that R2 presumably measures. It could be argued, of course, that there 

are no psychological grounds. to expect the notion of cognitive control to be in- | 

variant across combinations of values of k and n, even in the modest range con- 

sidered. This, however, is obviously not the manner in which R2 has: been: used 

and reported in the literature on LMR modeling of human judgment. Such a view, 

moreover, raises with even greater acuity than do the anaerereinacres and ambig- 
me 


uities discussed above the cpraseaueti ia issue of what esd is meant by the notion 


of. cognitive control. that R2 spied measures-~and: that R2 does not. 


The. crux of the matter, Ei is that as a coefficient of cogmitive control - 
or consistency R2 is’a measure which in its current use for communigat ive pur- 
poses as well as in its practical applications, conveys information: which is 
extremely difficult to interpret. Uneritically related to, this coptticient may 

therefore have little informative value; worse even, it runs the warious risk ..— 


ig» 
» 


of being plainly misleading. : 


f 


S 


Although this is socunis obvious, it may be isafal to note hat from a 
statistical viewpoint all that has been said above about R* as a meqnure of cog- 


nitive control or , conatecencys.supiies.aith-equal -strengeh tor the "1qterpretation 


Peat mensiattlialll 


of this coefficient as~a measure of fit, that is, as a measure of ‘apemese in cap- 


RUEARE a judge" 8 policy (cf. Hammond, McClelland. and Mumpoyer;- saa 2 pps 1215714957 


tans, Murphy and Marques, 1982). Indeed, whether the size of R2 1s attributed to 


the ability of the judge or to that of the modeler is irrelevant to the operation 


and magnitude of : ¢ 
| 2 


, 227@ 


: the artifacts that we have discussed; clearly, this only affects the substan- 
tive process which is in danger of being misinterpreted. 
Additional Results 
r We have seen earlier that according to one’s choice of the equat fon which 
: ‘ . eM 
3 is deemed to represent a subject's policy, the number of cues in the model can 
“. widely vary (specifically from 5.5 to 14.7 cues on the average per equation in 


the empirical example discussed ‘in Table 1). This problem, together with the 
ambiguities attached to the squared multiple correlation coefficient when it is 
used for the purpose discussed above, as well, as just noted, as in its ear a 
measure of success in capturing a judge’s policy, lead to a self evident conclu- 
sion. It is that the grounds for determining whether a subject has or not self=" 


insight into his policies are much less solid than is commonly assumed. 


Nonetheless, once an equation with its R2 is selected by whatever cri-. 
terion, the assessment of the subjects’ insight requires that data be obtained 
about what they feel their policies were. Several data gathering ‘methods have 
been tried, yielding very similar results (Cook and Stewart, 1975). One of 

"these methods, the scarcity scale, appears to have become standard procedure 


(Schmitt, 1978). The following discussion will focus on this scale. + 


pes 2 snentnseemnonerecemtennnitssiesent 
a P Fe eT nentnimetiiualiath , a 
t r eatin ’ 


Elicitin, the Sub jects-Self-Insigh 
 amaiaiaaemememneeumnmmuniual i oes 


Consider briefly the structure of the scarcity scale method, The procedure... 


| = antennae EAN POH etectsprtarngea ncsicieent ene ‘ ber 
boonrnme—r—eonsises of instructing the subjects to allocate 100 pointe among the cues in a 
b manner which reflects these variables’ relative importance in the set of 
| . 


judgments just completed. Response-wise, the subjects ‘tend ‘to carefully comply. | 


= 


_ They are scrupulous in two senses. Firstly, they are careful that the points 
allocated do add. up to 100. Secondly, they attempt to allocate weights to all 


‘the variables that they have considered. Some subjects are scrupulous in the 


. ees 3U aS Rei 2 


first sense only, and do not allocate weights to “minor cues", although they are 


often hesitant: about this and ask whether it is permissible. But many appear to 
interpret the instructions to include the second request as well, and attempt to 
allocate weights to all the variables, no matter how minute the discrimination 
they have to make and how uncertain they are about it. This part of the task is 
typically characterized 7 growing signs of hesitation, including erasures and 
the use as a lagt resort measure of some arbitrary rule for ai locating a few 
points among the cues left over, or for redistributing the pointe so that every 


cue is included in the allocation. 


By its logic, this extensive type of allocation eanardy brings to mind they 
equations of type C in Table 1 where, it will be recalled, all the’ variables 
which ean potentially enter into an equation are in fact forced into it by the 

ve ; nature of the decision rule. It was noted-at the time that this is a misguided 
procedure. It sécns peculiar therefore, that we goede unwittingly put our sub- 


jects in a structural situation where ‘they gre forced, in fact, to do what 


~ 
‘ 


v 


_ should not be done, neither during model development, as we have noted earlier, 


_ nor in all probability during insight elicitation, as we have just indicated. 


To put it differently, the findings obtained by means of the scarcity scale 


method could constitute a typical case of demand-response. This brings us to the . 


aaa 
second ma jor topic of iterest in EDIE, PAPER trsnanmmnnmnn amcenisinemnammnmnsnnnnannantet 


cannncfeanneniewvnernanaleensporniienanine es 
Wenn anaemia soneionny 


To explore it, an alternative conception of the structure of the eae Which | 


tad 


4 8required- to- assess” ‘self “insight is- desirable, for the situation “gust | ; 
“discussed is eoupoindad by the fact that subjects keep insisting that their judg- 
-ments are.configural, and that they relate to clusters of variables rather than 


to individual cues. That is to say, we cannot simply rely on the use of 


-29- 


the most important of the subjects’ introspective weights if we wish to go 
beyond the computational aspect of the demand reponse issue, and address the 
potentially even more consequential substantive one. 

That aub jects do often relate to combinations of cues is a common obser- 
vation. For instance, in the case at hand, process tracing during the pretest 
showed that age, achievement expectations, high school grade and stints origin 
could be regarded by a subject as indicators of motivation, the concept he might 
say fins ile caus trying to infer at that particular moment. Mere generally, 
the data also suggest that what the subjects actually attempt to do at this 
level falls into two ‘categories of information processing: the interpretation 
of cues by means of dther ones, and the inference of core variables or 
underlying "factors" (e.gs motivation) by means of subsets of cues, sass of 
which act as stable indicators, while other vary as a result of the aforenen- 


tioned interpretations... L 
x FY 


; . oie ae 
In terms of. our current eoneenns gathering data about this dual process, 
especially about the stable nideriyane clusters and assessing the weights of 


these “factors” in the judgments, is clearly one possible alternative way of ‘the 


kind alluded to above to confront the subjects’ claim and to evaluate the extent 


of their self-insight into their policies. ; ee Fg . 
mer + Ce “approach - to “tapping the sub jects’ self-insight vill yield a peectersye 


aman seaiadancespieiatiee 
pipet yi em 


index, sindilar to that produced - the erandard Lpainceicaenchawed of. Subjective | data 
‘cameneeenl ~about the respondents” Yeliance on “individual « cues to which it is intended to. me 
compared. This creates a problem in that there is no ready-made procedure for 


comparing the merits of indices. .It would seem that a reasonable set of cri- 


ae 


-30- 


‘ 


teria for the assessment of interest could include the following: the size of 
ie Ok ~ the correlation between the predictions based on thé veights derived from the 
two types of self-insights and the actual judgments; the simplicity/complexity 
of the two indices; their theoretical construct validity; and their usefulness 


within the modeling process. 


* 


These criteria will be implicitly applied to the results of the analy- 


\ 
’ 


ses presented below. 
‘Data Collection © 


The data were yathered by means of the instructions that atandaediy accon- 
pany scarcity scales. Specifically, after completion of the judgmental task the 
subjects were presented with a list of the 16 cues-used in each profile ad” 
asked, in the case of nnetahe Wack individual cues, to “Please allocate among , 
the cues 100 pofnts in a manner which reflects their relative weights in the . 


judgments you made.” After performing this assignment, the subjects were given’ — 


ad 
a new list of the same cues with the following instructions: “Please consider 


again the 16 items of information. You may have related to, clisters.of.them yee 
joemnsananevenuinsnnenissnnnronranitinnnn Stet" . : : » 
rather than to individual items in making:your judgments. If so, indicate next 
Pa i. 


to each variable with which others you used it as a rule, by employing a. common _..»- 
: erent AR APTN TT , " 


r . 
ai erpeemetterteipro NAH HORSE eaETY PN TIPE 
serepgeon pert tbeinrlorererspeitU ner + 


poomemermne=—""gymbol for each grouping -- say, different numbers. If an item of information 


was used in several clusters, write down next to it the identifying symbol 


| (numper) of all the ggoupings to which it belonged”. 


’ 


i 


o31< 


After the completion of this task came the request to “Please give names to 
your yroupings." This was followed by the concluding instruction to "Please 
allocate anong these groupings 100 points in a wanner which reflects the rela- 


tive weight of each cluster in your Judgments". 


\\ 
.In all cases the foregoing instructions were sequentially handed out and 


explained to the subjects in a face to face session with an experimenter. 


Subjects — 


XN ) = A 
¥ x - re | ” 
* ‘ 


Data are availabe for four subjects only. The idea of aiking gab socks to 
indicate how they na <icetared the cues, if at all, ind bd weights they gave 
- ; to these factors in their decisions emerged serendipitously during the 
‘debriefing’ “of sub ject auubee 2 Cine aparked « the idea by spntanaennly volun-— 
‘teering some of this information in the course of her Justification of her 
objections to the questions asked about individual cues). . The ined pientidden 
“quickly crystallized, and was operationalized in time for the debriefing of sub- 
Aeek number 4 -- its first application. All subsequent subjects were asked the 
foregoing questions about clusters. Subject number 6, however, “is an exception 


owing, to a personal misfortune which interrupted het participation in the study 


* ; just prior: ‘to bein administered the self-insight pcales. Hence the availabi- 
nec) atipaicercmn . 


atrennrgon Fr 


oeenecepinenrg sens 
SRE emma seaieaiiniedll 


AY OF GEER FOE Sb subjects. “number 4,5,7, and 8 only. ue 


t 


ceprerermmerenrimnrnersneee P roCedUrE 


The general procedure for computing the predicted scores was the fotlonemyy 


° 


In the case. of insight about individual cues, the: ‘z score of each cue (properly \ 


¢ 


+y. 
I 


f 


signed -- the sigh-havin, been taken from the subjects’ mulctiple-regression, see 
Table 1) was multiplied by its weight (zero, if the subject had disre;arded the 
cue), ahd the cecitonas score for a given profiles was the sum of these —— 
ducts. In the, case of clusters, the z scores of the cues constituting a 


/ 


wrouping were first summed (after navang been properly signed, as above), and 


each grouping was mltiplied by its self-insight weight. The predicted score of 


a given profile was the sum of these products. Note that by this procedure all 
the cues defining a concept were given equal weight --obviously a gross (but 


conservative) oversimplification. 


‘. 


‘Findings 


‘ 


e 


A. »Prima facie validity of the “factors” indices. Columns 1,11 and IIL.of 
Table 5 ‘show that for eas of the four subjects the judgments predicted on the 
basis of self-insight about -the ance inferred yield higher correlations with 
_the actual judgments than is the case when the comparable predictions are made 
with data about individual cues; The tentative conclusion which emerges, 
therefore, is that information about clusterings of cues may be an sigecnitive’ 
way of investigating the subjects’ self-insight into their judgments, a way 
which in addition to being theoretically grounded appears to be Supirically 
justified. _Note that this conclusion-ie-in-essencé-sial lar” to that reported by 


TT etaateleniaemniaiieanttl 


Cook and Stewart (1975), who also found that probing the subjects self-insight 


about interact ion effects --albeit sbout-individual cue” wtilisatéen rather ‘than 


“fee contigiral concept inference (operationalized here by a simple additive 
index)--yielded predictions which tended to correlate higher with the actual 


judgments than did alternatively derived predictions. 


“Insert Table 5 about here 


Crt 


sree acter ry ore 
enn 


33} 
mM related question is that of self-insight with regard to the substantive 
information taken into account (whether cues or clusters), versus self-insight 
" 
yi . ‘ 
concerning the weights Jiven to this information. One way to address this 


question is to recompute the indices used in columns I and II, with exactly the 


same cues (properly signed, as before), but this time, without weighting them 


.prior to summation. Columns IV and V of Table 5 present the correlations bet- 


ween actual. and predicted judgments obtained with the indices recomputed in such 
a wanner. The finding of interest which emerges is that the pairwise differen- 
ces between these columns (see column VI) yield a pict tte which ts practically 
the negative image.of that found in column III. That is to say, disregarding the 
ere that the subjects give to the clusters leads to a greater relative loss 
of predictive power than in ‘tie case of individual cues. This trend suggests 
that the subjects, ye least sone subjects, may havé more self-ingight about 
the weights they give :to tlie concepts they use in their judgments, than to the 
cues which evoke hats This fact has evidently implications for the assumptions 


embedded in the instructions commonly given to the subjects in studies of their 


self-insight,  - a 


It is noteworthy that the foregoing interpretation receives some small but 


nonzneglible support from examining the question of the tangible consequences of: 
at Seat ease € 


ty 


‘ - rer Pet Semmens 2 
the information.provided~by-the-subjects” about their self-insight. This issue 


is briefly examined below by using the subjects’ self insight to see whether it 


\ 


nae 


“statistical model (sce Table 1, decision rule B). One procedure to achieve this 


"end js to add to the equations the variables reportedly having been used, but 


which are. not in the statistically developed equations; in the case of indivi- 
dual cue utilization, this merely involves adding (forcing in) the cues the sub- 


jects feel they have used and which are not in the IMRs. For the concepts 


‘ : ‘ 


ow? 


nee 


rr 


. Yo 


inferred, the procedure requires computing: interaction terms representing each 
cluster, forcing into the equations the -cues involved in these terms which are 


» 
not yet in the model, and testing whether the interaction terms add any 


explained variance to this recomputed base-line (Cohen and Cohen, 1975, Ch. 8). 


Column VII in Table 5 shows the result of this analysis serene eubdecte’ 
self-insight about individual cue utilization. The result is that in no case 
does the corrected measure of explained variance rise about the level previously . 
achieved with the statistically developed model. In other worde, in the 
° occurrence the self-insight of the subjects under consideration is useless for 
attempting to improve on this model. In the case of clusters, on the other 
hand, column VIII shows that in three of the four cases at least one intveuphicn 
effect does, increase the corrected amount of explained variance by at least one 
patsenbene point. while this fisure is admittedly low, the results are in line 
with the isuteudei we habitually encounter in the ‘studies reporting success in 
dukaceian Inbecantiun Wie ws means of standard statistical analyses 
iO eatin: 1968; Glivie and Schmitt, 1979). In terms of the substantive pro- 
cess of interest, ue hu this modest finding, together with the sectern of 
the data exhibited by Table 5, suggests that the subjects’ self insight about 


x 
concept inference is saree to be grounded in reality; as noted earlier, this 


nelusion:{s-by-no-melans-new-(Cooka nd Stewart, 1975). However, the phrasing 


of the question by means of which it has been replicated here does have 

re ry Anplications- forthe problem ‘at~hand. Specifically, the are ae ee 
subjects’ use of satieestion just examined recaste in a very different light the 
famous discrepency between the number of cues the sab dacts typically report 

taking into account and that which is sufficient for accounting for the bulk of 


“Ku 


¥ we variance of their judgments. This becomes clear’ below. 


37, 


Fee et 


-S- 


B. The demand-response issue. The comparison between columns I and II 


of Table 6 shows that subjects attempt to idafcer relatively few concepts, between 
3 and 5 in the present case, as opposed to an average of over 12 cues when the 
question they answer is put to them in terms of individual cue utlization. That 


is to say, the luwag which obtains regardin, the size, and even the very 


existence of the discrepency noted above, is antithetically different depending 


upon one’s choice of perspective: number of individual cues or number of con- 
cepts used. Note, moreover, that if the comparison is momentarily kept.at the 
level of individual cues only, the sub jects inaicace that they use less cues 
when the serene is phrased in terms of -underlytng concepts, than when it is 
phrased in terms of individual cues. Specifically, in contrast to an average of — 
‘ 12.75 cues in the latter case, they report using an sentane of 10 cues (1£ we 
recount a cue each time it is used -in‘a different cluster), or an saiaie. of 
_ §.25 cues (if those relied upon are counted only once, independently of the 
number of concepts on which they “load", to use an enticing analogy; see Table 
6, eetuiie I, IV and III, respectively). This latter finding is of enough 
importance to warrant a brief comment. + 


A . 


Insert Table 6 about here 


ie are rental Bo 


There appears to be two.main..reasons-which~could~explain the difference 
between the average number of cues found in column I (12. 75) on the ‘one hand, 


» and in columns III and IV (8.25 and 10, respectively ),-on-the-others”"Oné possi= ~~ 


“bility is that when asked to indicate the clusters tofwhich cues might belong, 


- «+ 


' the subjects have omitted to mention those they may have used as single measures 
“of concepts. The other, is that the difference, especially that, between 8.25 
ana 12.75 dués, may be indicative of the denand-response effect. hypothesized 
‘asrliae —even when sub ects do not interpret the inetructioas te mean that - 
weights should be allocated to every cue;, in such a case, they ‘could tend to 


interpret the instructions as a ecaa make an effort to allocate weights, 
| « : » 


ae ‘ . P ' “x 
. 


“S6- 


if not to every cue, at least to the maximum possible nimber of them. The data 
for subjects 5 and 8 are compatible with such an interpretation (see Table 6, 
column I versus columns III and IV). 

Although the merits of the two foregoing (non-exclusive) explanations can- 
not be decided with the, data at hand, the impressionistic evidence available 


from the debriefings suggests that the second explanation comes closer to 


describing what is actually happening. 


Be this as it may, we have to contend with the fact that depending upon the 
instructions given, subjects report usin, 3 to 5 concepts in their judgments or 
an average of over 12 discrete cues. It could be argued that this shaparians is 
unwarranted. Indeed, it can be held that the figure of 12.75 cues (Table 6, 

column I) should sheceed with that of 10, or at least with that of 
8.25 cues (Table 6 columns IV and III, respectively). While this is a tenable 
positton, this remark does not affect the essence of the argument made in thie 
section, but simply rephrases it. The reason is that this criticises talites a 
call for a proper comparison, a request to which it is only appropriate to 
respond by pointing out that to be consistent the foregoing figures should in 
fact ‘be compared with those in Table 1, sapeckatly with those of equation Gs 


In particular, the fore;oing data and Sine ac Jend ‘Support to the view that ,~ atl 


aeeerernrey nT aT nT 
assent een ew 


the level. “of SeoLvidual ue, i Bile: correct beaiarwredince is probably between man’ 8 
‘average of 12.75 (Table 6, column I) and that of his "C" models of 14.70 (see 
“ESET OR OF Table 1, first row of suama ry data); the sini larity of the eels of 
. waebute of these two figures hardly requires emphasis. Moreover, it is of 
‘interest to note that the figures even suggest that contrary to the prevalent 
’ imagery, the LMR procedure may be less discriminating with regard to the margie 


nally relevant predictors it includes than*are the human ‘iden 


as 
ee i , ® 


serra 


ten: 


. Me 
Got eee 


tm 


7, ow, to ater + Ra, 


oer x: 
bd 2 


: 
i" 


acne mt anor 


237- 
C. Transitional remarks. The crux of the foregoing discussion is that in 
order to be valid, the conclusion that man lacks self-insight should rest on 
comparisons which are internally consistent. In the light of the preceding 
rguarks, it appears that appropriate comparisons could include the following: 
firstly tis convantwor of the factors extracted from the analysis of the 


matrix of cue intercorrelations of a task with the clusters obtained from 


answers to questions such as those illustrated above; secondly, the comparison 


’ between the “best cues of a LYR with the subset of those selected: by the sub- 4.5 


jects as also being the most important, either by their response to a direct 
qyestTon, or as implied by the relative weights given to the cues in the-process 


of allocatiny, among them the points of the scarcity scale. 


To the best of my knowledze, feither of these internally consistent me thods 


has been applied in studies of man’s judgmental self-insight. In the case of ~ 


objective and subjective factors, there is a practical i haan the data 


- required for such a comparison are rarely available. The reason, is chat the cue 


. 


intercorrelations are typically prédetarmined by the researchers! and factor ana-.. 
lyzing ‘then would produce an objective dace of the clusters shioalacs ‘by the 
experimenters , rather than one of the sub jects’ objective clustering policy. 


Rare studies where ‘this is not the case include those b Fhelpe pnd Shank eatmmmoon-s- 


pareve eee manele a enema poeta 
sake ae — 


(1978), 5 een and Einhorn and Koelb (1982). “These researches -- none 


of which, incidentally, directly, address. dais Assue of self-ine{ght - a ere 


b Ss 


instructive ces that they ‘suggest. that’ in’ ‘the sind of tasks under discussion ‘sub- 


ws 
or . 


‘jects, may | typically infer between three and six-seven concepts. By this stan- 


dard, the fijures obtained in <p luan II of Table 6 are compatible with the view 
chee the sub jects ‘may have a rather accurate perception of the ounmber of L% 
clusters/concepts that they use in:the type of judgmental tasks under considera= 
tion (although it met be noted that the. difference between the findings | 
reported by Phelps and Shanteau, 1978; and those documented in the two 


7 4 4U 


4 Sa 
Cee i i D nM y 


oY 


‘* BPs Se ; - ; . * : a 

tee! : ae a : * 
other studies just referred to suggests that the,number of concepts inferred may 
‘ - a 3 
: ; also be a function of the qunher of cues presented), 
ms ‘ t ’ $ F 
In the case of objectively versus subjectively identified “best” cues, 
x. ROMENER) feasibility is not an obstacle. One simply needs to set up a procedure 


for setactios the most important sub jective cues. In the present ‘case, the 
method chosen for :constructing the abridged subjective indices which will be 


shortly analysed, takes advantaye of the information’ available about subjects’ 


2? 
clustering. Thus, for each respondent, the number of. clusters he or she had ‘ 


‘ | YTeported to have used serves as the theoretical rationale for Se ar the 
| “number of individual cues to be included in-the condensed index. Operationally, 
: these cues and their subjective weights -come from the fesponses to the probe 
about individual cue utilization (cf. Table 6, column I). In each case, the 
pet at number df cues iiichaceet cs just noted, from thie respondent ’s 
clustering, penta: e.g. 4 for sub ject munbar Bout (see Table 6 column II)-- was 
‘ selected by including the cues subjectively given the highest relative weights, 
or by random choice among the equally wantin ones in case of a tie for the | 
last needed qus(s)> The 2 scores of the selected cues were then properly 
sie: as described wartiar,. prior to bein, multiplied by their subjective 
weight and summed-, In “auch a manner, the juiabere of cues incorporated into the 
selectively recomputéd indices range from 3 to 5, with an sutige of 4.25 cues 
— index (cf. Table 6, column II); . in constrast, the standardly computed indi- 
ane range From 7 to 16, with an average of 12.75 cues per index (Table 6, column 
-I). | 
i De Subjects’ self-insight revisited.” The findings resulting from the 
foresping procedure can be summarized as follows. The average squared correla- 
tion of the abridged indices just described: with the actual judgmefts is .38. By 
comparison ithe average r2 of the non eheaaee™ indices with the actual judgments: 


‘ ; += 


one 2 aN a 4i. 


bs | . 


c 


is .46 (ef Table 5, column I). The obsérvation of importance is that the 


ratio of these Figures fs .82, a result which ig very close to thé Widely quoted 


let 
r 


rate 


eoneluaton of Slovic and-Lichtenstein (1971, P+ 684; sée also Hobson, Nendel 
and Gibson, 1981) that a few selected cues identified by LNR analyses generally ™ 

' suffice to account for over 80% of the explained eee of the subjects’ 
responses. As it ee out, these authors’ conclusion sounds more unusual than 
it is, for the indices abode man’s self-insight appear to behave in exactly. 

_ the same sania That is to shy, THe 35 possible to speculate that for many 
empirically devetopwd indices 3 or 4 iteme night well turn out to explain most 

of the variance that enlarged naeneee mi,ht be able to suites (see supportive 
indication below). Be this as it may, the finding just discussed indicates that 
the results documented in the context of LYR analyses cannot Serve as an yncri- 

'» tieal basis for inferring that subjects lack self insight into their judgments, a 
in particular about the number of cues they use. It is of interest to note that a 
this: conclusion is buttressed by the data of the other three subjects for whom \ 

data about individual cue utilization are wigs available -~although without 
information* about clustering for suldtin the choice of the nunber of cues to 
include into the abridged indices. The absence of these guidelines turns out, 
however, to be nok Rieines Taking the three most important cues of each sub- 
zect for anieinis out of necessity, uniformly abridged inaivon leads, indeec, 
to the explanation of 79% on the averaye of the variance O& the actual judgments 
explained by the full fledged indices; with indices based on the four most 

‘ important cues, the fizures. rises tq822. Evidently, the important implication 
of this findin,; is that the indices measuring man’s self-insight may be quite 


{nscnsitive to the deciston rule used to .abridge them, (i.e. a uniform number, 


van 


or one derived trom other considerations, @ege evidence about clustering), as 
i ' , - , 


well as to the exact number of cues selected to recompute them -~in the range of 


roughly 3 to 5 for the type of tasks under discussion. 


: * 
x : 
: : . - 
¢ og . 
‘ 


a ’ 
iveiptsulseton Bd a: 


The, findings presented in this section rest on a-limited data -base, and 
must therefore be regarded as preliminary; nonetheless, they are instructive. 
, 


In conjunction with the underlying argument these findings’ can be summarized as_ 


follows. 2 


The subjects that we have studied assert that they utilize between 3 and 5 
concepts in the fairly typical -judzmental task that was used in the present 
study. The alternative. mde of elicitin, their self-insight which produced 
‘ease data appears to be grounded in reality; in particular it can be used to 
faprove the predictive power of straight LMR models and ylelde predictions 
which compare favorably with those derived from the traditional indices based on 


the subjects’ reliance on individual cues. 


The fteure of 3 to 5 concepts is incompatible with the accepted view that 
judges tend to misperceive the extent to which they rely in their decisions on a 
few major variables. The likelihood that the prevalent portrayal of people’s . 
judgmental self-insizht is mistaken is further buttressed by the fact that a 
closer examination of the way data are gathered for building the traditional 


indices su,gests that the procedure may induce a demand-response effect. 


Moreover, nerely Ly being consistent and computing predicted judgments with 
: i. 
these standard indices exactly as one does in the case of LMRs, that is, by 


usin, selected cues according to the magnitude of their weights, shows that the 


"best" subjective cues appear to behave exactly as do the objective ones; in | 
both cases 4 + 1 cues-cxplain about 60% or more of the variance explainable with 


- ( ' 


the full fledged indi dbs or the extended LMR’s. 


4 


rae wd a | | ie 


; + 


' In short, whichever yardstick one adopts for Sui iocking man’s self-insight 

-- the number of .concepts inferred, or the: nunber of cues that is sufficient ‘for 
explaining the bulk of the variance explainable by the enlarged indices -- there: 
dovs not appear to be ere crounds for assert tne that IMR research provides 
woldenda Car worqaltusen att the cues he processes with the most important 
variables he actually takes into account in making his, judgments. Ie may sorely 
~ ‘lack self-insight. ‘But, as I have endeavoréd to show , the results produced by 

the traditional type of analysis of his capability in the context of Lik | . 

judgmental tasks cannot provide the evidence necessary to establish this fatts 

The senior lies in part in the need to properly conceptualize what is being 


- 


measured. In part it lies in the insufficient attention paid to the necessity 
of makin, comparisons which are internally consistent. And it also lies in the 
fact that in ‘ world characterized wy multicollinearity and monotonic rela- 
tionships, most indices could well turn out to behave as do LMRs, that is, 
increasing either the number of index items or that of the variables entered 


into a LMR might well lead in both cases to rapidly diminishing returns, perhaps 


at comparable rates. 


Be this as it may, one overall conclusion stands out --however the 
reader assesses each of the two measures of self-insight considered in this sec- 
tion and ranks ther: in the light of the four criteria discussed earlier. The 


conclu@on is this: the manner in which the subjects’ self-insight is commonly 


elicited and computed raises as many questions concerning its meaning ‘and vali- 


dity as do the values'of the parameters embedded in the LMR models discussed 
earlier in terms of which this self-insight is appraised (cf. Part One of the 


analysis). 


DISCUSSION 


We have seen that the meaning of policy capturing by means of LMR models 


44 


| : : a ? | (ee hae eee see. 


‘ > ‘4 ‘ 
turus out'to be very ambijuous. This stems from the fect that there are several 


a 
te 


plausiple rules for choosing bctween equations, none of which is compelling or 

- standardly ajreed upon by students of human judment ‘by: means of LARs. Often = . 

| the ices asil used. to select the equation(s) deemed to capture the subjects’ , 
policies is not even reported. In the judgnental task that we have considered, 
the dearees of freedon ere from this situation translated into equations 
which could arbitrarily include fron 5.5 to 14.7. cues par eghatttin: on the 


average. 


y 


To compound the problem, a close examination of R2 shows it to be a very 
problematic measure of the notion of cognitive control or of success in fitting 
a model to the judgmental data. This statement does not overstate the case for 
the typical range of values of n ee found in LMR research of human judgment, 
although it does so for a narrow range of values of R2 near the upper limit of 
its maxiinuia magnitude (Table 4). Outside this range, however, that is for the 
bulk of.complex, real life tasks, the foreyoing characterization is justified. 
Note, incidentally, that it is precisely under such conditions that measuring 


accurately cognitive consistency for providing coynitive feedback is not merely 


of acadewic interest. 


. 
ry 


Poe 


Consistent with the foregoing, results, it is worth noting that, stepping 
outside the linits:- of the present research, we find that across co eae of 
aactawhed studies, R2, n, and k are related in the predicted statistical manner. 
Thus, in 21 studies reviewed by Shapira (1981) far which data about the afore- 
aenkioned paraneters are presented, we find the following correlations: 1) bet- 
ween R2 and sample size, -.52 and between R2 and number of cues +.27; 2) between 
R2 and sample size, controlling for number of cues, .-.61 and between R2 and 
nunber of cues, x ance nian for sample size, +.44. The anticipated artifactual 


relat lonships thus emerge as a clear trend. 


@ 15 


' 


Under these clreumstanees, it is evident that referring to the number of 


cues in a LMR or to the equation’s R2 as to descriptive measures of the judges’ 


policies is far from being as enlightenin, as it has come to be held to be. 


Noreover, and tein we have not touched upon these topics, three related 

ans make the informative values of current LMR analyses of human judgment 

even more problematic. The first is that cue~intercorrelations and cue redun- 

ree affect very significantly mer value and stability of beta weights (for 

diuuendl lent analysis of this” problem, see Gordon, 1968). The spreading repre- 
| sentative design philosophy (Brunswik, 1955a, 1955b; Ilammond and Wascoe, 1980; 


Hammond, HacClelland and Mumpower, 1980) of profile construction may therefore 


be in fundamental conflict with the methodological requirements of LMR modeling 


of puaan judgment --except when one deals with a judgmental task whose ecology 
is well understood and documented. The Serene oe that whatever _the alert 
the estimated walksite, there is no agreed upon Standard way to repay’ om As 


is well known, the possiblities which include a a et FS / (: a? and 


y . 


(Cr pe (Einhorn and Koelb, 1982; Hobson, Mendel and Gibson, 1981; 
: ) 


‘ 
‘ 


Noffman 1968); yield measures which do not necessarily rank order the cues in 


the same order of inportance (Darlin,ton, 1968). The third, is that if the fin- 
dings pertaining to r2 can be extrapolated to R¢ (as some empirical evidence 
suggests this may be the case --see Goldberg 1976, Table 1), the number of cate- 
gories included in the judgmental response scale further complicates the 
situatiqgfiby introducing another way whereby the quantity of explained variance 

a be arbitrarily affected to a significant degree. Thus, in the bivartate 
case, the effect takes the form of a systematic’ reduction of the explained 
variance;. the latter shrinks. increasingly as the number of response categories 

\ diminishes and as the size of r2 grows. To tllustrate, the same relationship 

which would yield a value of r2 = .65 with a five point response-scale, will 


produce one of r2 = .76 with a ten point response-scale, or vice versa (see 


| a x6 : 


viously documented but which fis independent of them, ts therefore potenttally 


Martin, 1973, Table 1). This effect which compares in magnitude with those pre- 


strictly awiditive. “oreover, its impact reaches its aaxdc-uisr as the anownt of 


explatner variane approaches its limit, that is in the range of. values where on 
the basis of tablo 4 one wight have danetuded that because of the reduced eftect 
of k and n on the explained variance, it becones relatively safe to’ relate to 
R° as to a neasure of coynitive éanbiat or of success in énntuvtie a judge's “ 


policy. Obviously, this restricted assumption, too, is unsafe. 


: \ 
Another result further obscures the meanin; of current LHR models of 


\ 


policies. This is the findi:, that the assunption that subjects can be eres 
\ 
terized by one overtime policy may be questionable. We have seen that the \ 
; . \ 
equations developed on the whole object sample and those developed on sequential 


subsets of it exhibit differences which include the variables that characterize 


the policies, their weights, signs, and the amount of (adjusted) explained 


variance already noted. 


That is to say, not only are there problems of measurement, but the very 
notion of poliey that LMRs presumably capture turns out to be elusive. What is 
the betas amount of profiles and/or time spent. judging shen ahh yields a 
neaningful iwage of a subject’s policy, or at least reflects a natural segment 


of it? Evidently, what the analysis of data yathered durin; a typical 


experimental session: produces is often a statistical average which needs not 


~ 
‘ 


bear a direct reseublance to any of the policies involved, in particular the 


latest one being implewented. Under such circumstances, the yardstick used for 


supplying cognitive feedback to the subjects is obviously very problematic. 
” 


_ For the purpose of assessiny the judges’ self-insight,. the foregoing dif- 


ficulties which are inherent in the nature of LMR models of human judgment, 


_ are compounded by those created by the manner in which data about self-insight 


: ¥ i , 
are elicited... In particular, \° a i is that the findings documented to 


’ 


>. 


‘ date thvaive a levied enuee effect: Thus, taking the subjects’ claim that | 

they relate to clusters of died rather than.to individual ones .as a workin; 
| Nvpetliod tas one finds that in the present judgmental: task their intuition is 

that ‘they utilized between 9 and and 5 conceptual variables in their decisions. 
These values are at variance with the image of people being unaware of the 
extent to which they rely in their judguents on a few major variables. 
Moreover, constructing traditional indices which are simply consistent with the 
LMR’s computations with which they are to be shanebed: reproduces the famous LHR 


finding, namely, that a few selected cues suffice to account for about 80% or 


more of the variance explained by ther full-fledged set of cues. 


In sum, the jrounds si rationale for building LMR policy-capturing models 
and for providing subjects with feedback to improve their cognitive awareness 
and/or cognitive control turn out to be questionable in the sxhcesa This holds 
erue in terms of the objective model, in terns of the subjective data gathered 
about self-insizht, and with regard: to the wanner in which the two are then con- 
pared. In light of this situation, it is inescapable that the original aim of 
policy-capturin; cannot be said to have been achieved. This aim was Stated by 


on: of its pioneers to be the confrontation of the problem from which all others 


-may be held to stem, namely, that judgment is a process that we cannot trace: 


“It is as if we put our empirical data into a computing machine, the pro- 
cesses of which we did not understand and which frequently produced dif- 
ferent results depending on which machine we used and when we used it.” 


(Hammond, 1955, pe 255). 


This characterization of judpes applies cvidently as well, if not better, 
to the proposed solution --the present day LMR models. Indeed, there is little 
doubt that for many applications, we have replaced one black box by another. It 
is intriguing, therefore, to observe that the use of LMR models for cognitive 
feedback is spreading (llammond, PaaS Mumpower and Adelman, ade and 


gs , -46= he ca 


appears to fe audicreceioed by the subjects, sonetines with impressive results 
(Hammond and Ade luan, 1976, Anderson et. al. 1961). On the background of the : 
current shortes.:ings of the approach that vo have discussed, weaknesses which 
take their full weania, in the light of the cxpitede theoretical disavuwals 
nated in dip tobeolineedon: one Is puzzled hy the situation Midi ve developed. 
It could bo that its «explanation lies ta tw subJeects’ tavoraile reaction that - 
it ts te. ceiay to reojard as an implicit indication that sonethia, is furdanen- 
tally right in tiv cudeavor, in spite of all the present ay,uments to the 
contrary. “ut it is clear that this reaction of the subjects should Le checked 
for a possible artitace which appears to have beca completely overlooked in the 
literatur. --degpite its bern, a Familiar one. This artifact can be operationa~ 
lized by the €ollowin, questions: liow woul? the subjects react to, and accept, 
randomly cenerated wodels substituted for theirs? Or, similarly, how would sub- 
jeets relat. to their own moe Is, if, adapting one of Milzgram’s (1974) research 
Jesipns, they wero proscuted cto then not at the terainal of a computer, an’ in 
an acidenie contest, but in a less authoritative environnent, and without the 
computer aura? Without cvidence to the coatrary, it is hard to escape the | 
feclin, that what we say pesducely be witnessin, is the effect of the principle 
that for any siedienk sufficiently sophisticated methodoloyy is 


indistin uishable from wagic, to paraphrase an aphorism quoted by Parker (1976, 


pe 1). 


There is a whole literature dealin, with models developed for the purpose 
of predictin, au objective criterion, rather than for that of reproduciny aie 
capturing a jud.je’s policy on which our. discussion has focused. Many of the 
issues that We have raised have been gadcuaia in this literature. In par- 
ticular, the yuestion of the sample size and of the number of predictors in an 
vguation have been discussed or noted by Einhorn and Hogarth (1975; 1982), 
Lorans anal TERY (1972), and Keren and Newnan (1973). Similarly, the problem 


of the instability of the Leta weihts has been stressed in this context by many 


, o 


‘ 


' 


researche rs’sinc: Darliagtéu (1968), ‘Inc lug! ny, the authors just refcrred to, 
Schmidt (1972), Schmitt and Li.vine (1977), and Schoemaker and Waid (1982). 


~ With rezard to R® Cattin (196U) has recoumended the yeneral use of a corrected 
* 


measure of this coctficient and su,,ested a more accurate way of computing R2 

a when n 50.. The issue of si tha eo too, has been investizated in teras of 4, 
thé sublects’ ability to predict an objective criterion. To date, the evidence 
for this hind of self-insight is mixed, with findings which sometimes reflect 
favorably on wan'’s ability, and sometimes I. 3s so (Schaitt and ee 1977; 
Schuite, 1973: Cray, 1979; Shoemaker and Waid, 1982), in part, perhaps, because 


of the effect of the values of n and k used in the models (Cattin, 1980, p. 


413). 


The implicatlous of the findings and Hatniine Bound in this literature have 
not been canstaltzed to the activity of policy-capturin; and insight deter- 
Hdaabian however. Modeling policies is therefore an wadoaved which currently 
not only lacks a theoretical justification, but which also involves many serious 
practical problems. The result is that policy capturing is presently an acti- 
vity vith a very questionable rationale. The quality of cognitive feedback 
given to the jah isoke Ye anyone’s guess. And whether or not man has self- 
insight into the policies he applies in his judgments remains a cluttered and | 
unsettled issue, despite the pivotal: importance of this question for some 


research (Stillwell, Seaver and Edwards, 1991). 


From a remedial perspective, some of the problems that we have considered 
are procedural, c.y. number of response categories in the judgmental scales, 
replacement of R2 by a measure corrected for degree of freedom, manner in which 
data about self-insight are elicited and analyzed. Others are inherent in the 
LNR methodology, for instance the fact that --to use Kerlinger ‘and Pedhazur’s 
(1973, p. 442) words-- "A serious weakness of multiple regressions is what can 
be called the unreliability of regression weights." _ 

oU. ae 


‘ 4a - 


¥ 

z¢ 
* a 
- S » 


ye 
Be 
¢ 


the These class of dete enn be readily dealt with without yreat di f- 
ficulty, for pusenaudneotak wets or not they have a solution in sone feu 
sense, they can be fairly effectively controlied by holding their effect 
constant. On the other hand, the second category of problems requires serious 
analytical and methodological work which may take ‘years to bear fruit. It is 
possible for some time to "look at the other side of the cojn”, to continue | 
quoting Kerlinger and Pedhazur, (1973, p.-444 £f.), for LMR’s have indeed 


t 


strengths which could be taken advantaze of, if proper caution is exercised. In 
the long run, however, it seens clear that if LHR modeling of - human judgment — ? 
for policy-capturing purposes is to be regarded as a justified endeavor, the 
validity of the models as a reflection of the judges’ policy must be 
demonstrable. ere the likely equivalence of various types of weights for pre- 
dictive purposes clearly defines the problem that must be confronted: in the 
light of the current inability of policy-capturin, models to effectively compete 
with these alternative schemes in terms of relative (and at times absolute) 
levels of aduenseat: one of the two followin, alternatives must evidently be 
Pana: The first is to convincingly capture man’s judgmental policies, and thus 
show that what is offered as an aid to the limitations of his self-insight is 
waite feedback, and not sone mixture of elements of a policy with statistical 
and methodological artifacts, the latter unextricably intertwined with, and 
possibly completely overshadowing the former. The other is to accept the 
prospect that the endeavor may increasingly be seen as devoid of a defensible 


x 


scientific justification. 
The seeds of this conclusion are implicitly found in the points made in’ 
the literature on predictive linear models to which we have ‘referred above. 


These points have tended to be discussed as discrete issues, however, a fact 


which explains perhaps the lack of sufficient attention paid to them for the 


r 


ci, 


a 


#2490 


topic at hand. This assumption underlies the present attempt to brin, then 


ry 


together and to spell out their implications for current policy-capturing work. 
Looking back at the picture which’ cnerges from the discussion, it is not far- 
fetched to say that these implications could be reyarded with some justification 
by a critic of policy-capturing work as suggesting that the king is currently 


t 


naked. rue 


Lest this remark be misunderstood, let jm Hawken to wad shee y emphatically 
do not believe that this is the case. I am convinced that policy-capturing 
research is theoretically important, and that the ain of providing subjects with 
cognitive feedback is of practical’significance. More importantly, past studies 
which have wunlded sous or all of the pitfalls that we have discussed have wade 
significant contributions to our knowledge. The point of the foregoing remark is 
to call attention’to the fact that once this has been said there is, however, . 
only ‘so much which can be accomplished without confronting the key problems 
involved in the LMR modeling approach. Ic is in this perspective that it is 
important to realize that the list of threats to the validity of current policy- 
capturing work and its applications create a situation which is not a strategic 

one for complacently continuing carrying out research in the habitual way, and 
thus risking the prospect of having, to face thevdecusanion just noted. Put dif- 
ferently, the ifsights and findings accumulated to date clearly need'to be scru- 
tinized and sohuuladseed in the light of the potential threats to their validity 
that we have discussed.. 


SS a eee ie 


. We have considered a number of problems which contitute validity threats to 


current LMR models of human judyments. These include: 


ve 


> 


a 
-50- 
1. The lack of unambi uous criteria for including variables. in a IMK model 
and for choosin, the equatlon deemed to reflect the judges’ policy, a 


problem which creates an unstructured situation that needs to be given 


attention and corrected. 


ee 
2. The necessity of ayrecia on an Cr eae of the notion of 
wei,ht, and of presenting evidence of their stability; at least that of 
reporting the wei,hts in a wanner which permits one to recompute them 


differently within the LMR paradiyn. P : . 


3. The need of documentin, the effect on R2 of the number of categories in” 


the response scales given to the subjects. 


4. The necessity of clarifyiny what is meant by cognitive consistency and 


by success in wodeling a judge’s policy. In particular, the need-of 
* ‘ 
exauining whether n2 should replace -R2 as a Standard measure of thesc 


notions. And, if so, which,method should be used to correct, for the 


biases involved in the stepwise development of equations 


The necessity of determining the appropriate sample size for model(s) 


wn 
. 


building, in li,ht of the evidence that subjects may systematically 


change their policies in the process of judging a set of profiles. 


And, after clarifying and improving, the models alonj such lines, and in the 


process increasing their claim to trustworthiness, 


6. The need of eliciting the subjects’ self-insight according to a theorc- 


. 


tical conceptualization of the process investigated. At least, that of 
eliciting the relevant data in a manner which does not induce a demand- 


response effect, and of computing the predictions derived from sélf- 


‘ it ae 
insight in a fashion which parallels in its logic the procedure used 


- € 


53 


with LAR models.. 


In sum, one of the cont¥al proilens of LMR ‘models: of human }judgment is that 
for eh typical ve luwe of n and k used for model building, regression weights 
are antéliable. Their stability, however, is of crucial importance for the 
tenability of the assumption that a policy has been captured and deserves, © WR 
therefore, tuo le fed-back to the subjects. With the beta weights defaultin,, a 
renainin, indicator of the quality and stability .of a model is potentially the. 
value of x2, as we. fine, indeed, ein ths coefficient is used in the literature 
--in yeneral implicitly, but on occasibh explicitly (Hammond and Marvin, 1981). 
tlowever, : we have seen that this coefficient can be seriously misleading for a 


Ae 
. 


number vof reasons. These inelude the object sample size sud for, model deve lop- 
ment, Kis nuuber of arcahies included in the model, the number of categories in 
the response scale, etc. The interpretation of eoriede LNR models, of human & 
judgment is therefore ina vha ny cases uncompelling. This situation is compounded : | 
~ problems in the manner in which the subjects’ self-insight is elicited and 
analyzed. Togetter these difficulties raise fundamental questions. about the 
accuracy of the characterization of lunan information processing derived from 
this evidence, about the validity of the portrayal of the subjects’ aeit-Lagdont 


it sustains, and about the usefulness of the feed-back provided tothe subject. 


This situation needs evidently to be remedied. By spelling out, the extent 
to which it is problematic, this paper will hopefully contribute to the stimu- 
lation of. the necessary corrective work. In the meantime, it should serve: as a 
warnin, a,ainst accepting with too much faith some of the sensliterces which stem 


from current LMR models tncritically related to and used as dependable descrip- ~ 


tions of human judgmental policies. 


4: 


we 


VOOTNOTES: sl - 


° 


. 
® 


This holds true for most decision rules owin,, to the fact that we can. 
‘ é he 


o Y 
encounter situatins where R2As statistically siznificant, while none of 


the tests for the individual X’s are, and conversely, situations where the 


9 


t tests for one or wore Andividual predictors are statistically siznifi- 
: ? 7 : 
: a 


cant, while the overall R2 is not (see Cohen and Cohen, 1975, section 3.7, 


: a 
especially pp. 108-199). Not surprisingly, proposals to safaly deal.with 


these problems are open to the criticism that they are overly conservative. 


' A my, 
ta “A 4 
athe 


. 


One problem in coriparing the R2’°s bf independent equations is that 
the si,nificance of differences between amounts of explained variance is 
not readily determinable. One common heuristic procedyre tinder the cir- 


cumstances is to rejard a diffetence ,of 1% of explained variance as < 
. e , . a 


A 


noteworthy. Uowever, many researchers often seem to interpret this rule of 


~~ 


thumb as meaning that such an amount is noticeable, rather than necessarily 


‘ > . ] : . « . 
of substantive importance.’ As a result, there is a zone of ambiguity in 


© 


the interpretation of the significance of a yain/loss of explained variance 


which extends at, times to 2-3%. Workers specializing in the use of Liifs 


often regolve it (especially for large values of R2) by applying the 


followin, principle: a difference d¢jual to or larger than 10% of the quan- 


‘ 


tity 1-R2 is reparded as “significant”. In the case at hand, the values of 


the R2’s under consideration spread around .70; consequently, (1-R2)/10 «= 


3%. By this criterion, a difference of approximately this size, or 


f <8 
greater, between two RY’ s may be regarded as being unambiguously 


x 


“significant”. Thes€ guidelines are offered with no stronger claim for 


nf 


them than the fact that they ar#® commonly used and may be useful to fix - 


| s¥deas. act 


- - . 


-53- : 


‘ : : ) 
3. In practical terms this task can be carried out in two ways. The first ap- 
proach is to get continuous data on the process, a task which turns out to 
be very difficult and ecabanecne owing to both problems of data recording 
and analysis. Note that this difficulty is alee encountered when data are 
: gatuerad about individual cues; this has led in this case to the current 
use of the scarcity scale method referred to above. In particular, the sub- 
jects are requested to allocate points ta retrospect, i.e. the data elicited 
Ps . are atiout the weights of the cues psychologically averaged after the fact across 
piofiles. This method is used despite its fucvrtastino’; both substantive and | 
procedural (cf. Ericsson and Simon, 1980), the overriding conenoarapion ‘die 
its practicality, a the.‘supporting rationale that in matters of aie there 
PF ae also a sibatentive interest in the validity of insights as ee and ‘ 


communicated in rétiumpece: Along a similar line of reasoning, -it can be 


argued that if the information about clusters is conceptually important and 


: a different from that about individual cues, and if this difference is observ- 
able and robust, the application in this case as well of the second (retro- 
f spective) approach just noted could yield informative results--despite its 
a acknowledged imperfections. At the very least, the results would be directly 
comparable with those obtained about tndividual cues. Ina nutshell, this is 


the rationale on which the forthcoming analyses rest. 


f 4. Because, as we have seen, cue signs may change according to the data base used 
for model building, this procedure assumes that the sample size used in Table 1 
is an appropriate one for the purpose at hand, an assumption which is of course 
open to Sduseton: However, the fap bizar of interest of the findings to be 


‘ discussed turns out to be independent of this -assumption. This will become 


~ 


clear as the more general analysis and argument presented later will show. 


~54- 


REFERENCES 


Anderson B.F. et.al. 


1981 


Brehmer B. 


1978 


Brehmer B., R. 


1980 


Brehmer B. and 


1980 


Brehmer B. and 


1977 


Second Report to the Rocky Flats Monitoring Committee Concerning 
Scientists’ Judgments of Cancer Risk. Center for Research on 
i 


Judgment and Policy, Report No. 233, University of Colorado. 


‘ : 2 4 


"Response Consistency in Probabilistic Inference Tasks." 


* 


Organizational Behavior and Human Performance. Vol. 22, pp. 


103-115, 


Hagafors and R. Johansson 


t: Subjects’ Ability to Use 


a 
"Cognitive Skills in Jud 
Information About Weights\ Function Forms and Organizing 
Principles." Organizational*Behavior ae Human Performance. EY 


Vol. 26, pp. 373-385. 


J. Kuylenstierna . 


"Content and Consistency in Probabilistic Inference Tasks." 
Organizational Behavior and Human Performance, Vol. 26, pp.54-64. 
- \ 


K.R. Hammond 


"Cognitive Factors in Interpersonal Conflict". In D. Druckman 


(ed. ) Negotiations: Social Psychological Perspectives. Beverly 


Hills, California: Sage Publications. 


Brehmer B. and 


1976 


G. Quarnstrom 


"Information-Integration and Subjective Weights in Multiple-Cue 


Judgments." Organizational Behavior and Human Performance, Vol. 


17, pp. 118-126. 


Brunswik E. 


1955(a) 


Brunswik E. 


1955(b) 


"Representative Design and Probabilistic Theory in a Functional 


Psychology," Psychological Review, Vol. 62, pp. 193-217. 


"In Defense of Probabilistic Functionalism: A Reply." 


Psychological Review, Vol. 62, pp. 236-242. 


Bucuvalas M.J. . 


1978 


Camerer C. 


1981 


Cattin P. 


1980 


Cohen J. and P. 


1975 


“The General Model and the Particular Decision: Decision Makers’ 
Awareness of their Cue Weightings." Organizational Behavior and 


Human Performance, Vol. 22, pp. 325-349. 


"General Conditions for the Success of Bootstrapping Models." 
Organizational Behavior and Human Performance, Vol. 27, pp. 


411-422. 


"Estimation of the Predictive Power of a Regression Model." 


Journal of Applied Psychology, Vol. 65, pp. 407-414. 


Cohen 


Applied Multiple Regression/Correlation Analysis for the 
PASE See eee arses SUE ELS 225 S25 


Behavioral Sciences. Hillsdale, N.J.: Lawrence Erlbaum 


Associates. 


o8 


-56- 


fe Oe ae Cook R.L. and Stewart T.R. 


1975 "A Comparison of Seven Methods for Obtaining Sub jective 
Descriptions of Judgmental Policy." Organizational Behavior and 


Human Performance, Vol. 13, pp. 31-45. 


Darlington R.B. 


1968 "Multiple Regression in Psychological Research and Practice." 


Psychological Bulletin, Vol. 69, pp. 161-182. 


Dawes R.M. 
‘ 1979 "The, Robust Beauty of Improper Linear Models in Decision Making". 
. 
American Psychologist, Vol. 34, pp. 571-582. 
Dawes R.M. 3 
1971 ~ "A Case Study of Graduate Admissions: Application of Three 


Principles of Human Decision Making". American Psychologist, © ; 


Vol. 26, pp. 180-188. 


Dawes R.M. and B. Corrigan 
1974 "Linear Models in Decision-Making". Psychological Bulletin. 


Vol. 81, pp. 95-106. 


Dorans N. and F. Drasgow. 


1978 "Alternative Weighting Schemes for Linear Prediction." 
Organizational Behavior and Human Performance, Vol. 21, pp. 


316-345. 


Dudycha L.W. and J.C. Naylor 
1966 “Characteristics of the Human Inference Process in Complex Choice 
Behavior Situations." Organizational Behavior and Human 


Performance, Vol. I, pp. 110-128. 


ed 


. Einhorn H.J.. 


1971 


en 


1970 


“67 


"The Use of Non-Linear, Non-Compensatory Models As A Function of 
Task And Amount of Information." Organizational Behavior and 


Human Performance, Vol. 6, pp. 1-27. ° 


"The Use of Non-Linear, Non-Compensatory Models in 


Decision-Making." Psychological Bulletin, Vol. 73, pp. 221-230. 


Einhorn H.J. and R.M. Hogarth 


1982 


"Prediction, Diagnosis, and Causal Thinking in Forecasting." 


Journal of Forecasting, Vol. 1, pp. 1-14. 


Einhorn H.J. and C.T. Koelb 


1982 


"A Psychometric Study of Literary-Critical Judgment." Modern 


Language Studies, in press 


Einhorn H.J., D.N. Kleinmuntz and B. Kleinmuntz 


1973 


Einhorn H.J. 


1975 


Ericsson K.A. 


1980 


"Linear Regression and Process-Tracing Models of Judgment". 


Psychological Review, Vol. 86, pp. 465-485. 


and R.M. Hogarth 


"Unit Weiphting Schemes for Decision Making." Organizational 


Behavior and Human Performance, Vol. 13, pp. 171-192. 


and H.A. Simon 


"Verbal Reports as Data." Psychological Review, Vol. 87, pp. 


215-251. 


home 


Goldberg L.R. 


1976 


é 


=~ a 
Goldberg L.R. 


1970 


Goldberg L.R. 


1968 


Gordon R.A. 


1968 


Gray C.W. 


1979 


Green P.E. and D.S. Tull 


1970 


Hammond K.R. 


1955 


-58- 


"Man Versus Model of Man: Just How Conflicting Is That 
Evidence?" Organizational Behavior and Human Behavior, Vol. 16, 


pp. 13-22. 


"Man Versus Model of Man: A Rationale, Plus Some Evidence, For A 
Method of Improving On Clinical Inferences". Psychological 


Pte: Vol. 73, No. 6, pp. 422-432. 


"Simple Models Or Simple Processes? Some Research On Clinical 


Judgment." American Psychologist, Vol. 23, pp. 483-496. 
rs > , 


"Issues in Multiple Regression." American Journal of Sociology, 


Vol. 73, pp. 592-616. 
\ 
< q 

es 
"Ingredients of Intuitive Regression." Organizational Behavior 


and Human Performance, Vol. 23, pp. 30-48. 


Research for Marketing Decisions. Second Edition. Englewood 


Cliffs, N.J.: Prentice Hall. 


"Probabilistic Functioning and the Clinical Method." 


Psychological Review. Vol. 62, pp. 255-262. © 


ti 


oa tients ASST 


-59- 


Hammond K.R. and B.A. Marvin 
1981 Report to the Rocky Flats Monitoring Committee Concerning 
Scientists Judgments of Cancer Risk. Center for Research on 


Judgment and Policy, Report No. 232, University of Colofado. 


ul 
é 


Hammond K.R. and N.E. Wascoe (eds. ) 
1980 New Directions for Methodology of Social and Behavioral Science: 


Realizations of Brunswik’s Representative Design. San Francisco: 


‘ 


Jossey-Bass. 


Hammond K.R.; G.H. McClelland and J. Mumpower 


1980 Human Judgment and Decision Making: Theories, Methods and 


Procedures. New York, N.Y.: Praeger. 


Hammond K.R., J. Rohrbaugh, J. Mumpower and L. Adelman 
1977 "Social Judgment Theory: Applications in Policy Formation". In 
M.F. Kaplan and S. Schwartz (eds.) Human Judgment and Decision 


Processes in Applied Settings. N.Y.: Academic Press. 


Hammond K.R. and L. Adelman 


1976 "Science, Values and Human Judgment." Science, Vol. 194, pp. 


389-396. 


Hammond K.R., T.R. Stewart, B. Brehmer and D.D. Steinmann 
1975 "Social Judgment Theory". In M.F. Kaplan and S. Schwartz (eds. ) 


Human Judgment and Decision Processes. New York, N.Y.: Academic 


Press. 


Hammond K.R. and B. Brehmer 


1973 "Quasi-Rationality and Distrust: Implications for International 
Conflict". In L. Rappoport and D.A. Summers (eds.) Human 
« 


Judgment and Social Interaction. N.Y.: Holt, Rinehart and. 


Winston. 


Hobson C.J., R.M. Mendel and F.W. Gibson 
1981 "Clarifying Performance Appraisal Criteria." Organizational 


Behavior and Human Performance. Vol. 28, pp. 164-168. 


Hoffman P.J. 
1968 “Cue Consistency and Configurality in Human Judgment." In. B. 
Kleinmuntz (ed.) Formal Representations of Human Judgment, pp. 
53-90. New York. N.Y.: John Wiley and Sons. 
Hoffman P.J. 
1960 “The Paramorphic Representation of Clinical Judgment”. 


~ Psychological Bulletin, Vol. 57, pp. 116-131. 


Hogarth R.M. 


1980 Judgment and Choice. N.Y.: John Wiley and Sons. 


HO lbrook M.B. 
1981 _. "Integrating Compositional and Decompositional Analyses to. 
Represent the Intervening Role of Perceptions in Evaluative 
Judgments." Journal of Marketing Research, Vol. XVIII, pp. 


b E323. 


aes 
« 


a . -61l- 


Keren .G. and J.R. Newman 
1978 “additional Considerations with Regard to Multiple Regression ‘and 
Equal Weighting." Organizational Behavior and Human Behavior, 
Vol. 22, pp. 143-164. 
Kerlinger F.N. and E.J. Pedhazur 
1973 Multiple Regression in Behavioral Research. New York, N.Y.: 
Holt, Rinehart and Winston. , 
Lane D.M., K.R. Murphy and T.E, Marques | 
1982 "Measuring the Importance of Cues ‘in Policy Capturing," 


Organizational Behavior and Human Performance, Vol. 30, pp. 231-240 


Libby R. 
r« 
1976(a) "Man Versus Model of Man: Some Conflicting Evidence." 


ie R. 


a," 
1976(b) "Man Versus Model of Man: The Need for-a Nonlinear Model.” 


Organizational Behavior and Human Performance. Vol.16, pp. 1-12. 
, 


Organizational Behavior and Human Performance, Vol. 16, pp. 23-26. 
Martin W.S. 
1973's "The Effects of Scaling on the Correlation Coefficient: A Test 
of Validity." Journal of Marketing Research. Vol. X, pp. 316-318. 
Meehl P.E. ; . , 
1954 Clinical Versus Statistical Prediction. Minneapolis, Minnesota: 
University of Minnesota Press. - | 
Milgram, S. . 
1974 Obedience to Authority: An Experimental View. New York, N.Y.: 


Harper and Row. 


’ 


C4 


Nie N.H. et.al. 


1975 SPSS. Second Edition. New York, N.Y.: McGraw-Hill Book Co. 
Ogilvie J.R. and N. Schmitt r 
1979 "Situational Influences on Linear and Nonlinear Use of 


Information." Organizational Behavior and Human Performance, 


> 
é 


Vol. 23, pp. 292-306. 


Parker D.B. 


1976 Crime by Computer. New York, N.Y.: Charles Scribner°s Sons. 


Phels R.H. and J. Shanteau 
1978 "Livestock Judges: How Much Information Can An Expert Use?" 


Organizational Behavior and Human Performance, Vol. 21, pp. 


209-219. 
Schmidt F.L. 
1972 "The Reliability of Differences Between Linear Regression Weights 
in Applied Differential Psychology." Educational and. 
Psychological Measurement, Vol. 32, pp. 879-886. 
Schmitt N. 
1978 “Comparison of Subjective and Objective Weighting Strategies in 


Changing Task Situations." Organizational Behavior and Human 


Performance, Vol. 21, pp. 171-188. 


Schmitt N. and R.L. Levine 


1977 "Statistical and Subjective Weights: Some Problems and 


Proposals." Organizational Behavior and Human Performance, Vol. 


20, pp. 15-30. 


: -63- 
Schoemaker P.J.H. and C.C. Wa d 


1982 » "An Experi oe uahed of Different Approaches to : 
Deternining Weights in Additive Utility Models." Management 
Science, in pres 
Shapira a 
1981 A Study of Information Utilization in Disposition Ass igonents a of 


Probation Officers. Unpublished Ph.D. dissertation, The Hebrew c 


University of Jerusalem. 


. 


Slovic P., B. Fischhoff and S. Lichtenstein 
1977 "Behavioral Decision Theory". Annual Review of Psychology, Vol. 


28, pp. 1-39. 


Slovic P. and S.C. Lichtenstein 
1971 “Comparison of Bayesian and Regression Approaches to the Study of 
Information-Processing in Judgment". Organizational Behavior and 


Human Performance. vol. 6, pp. 629-744. 
wv 


Stillwell W.C., D.A. Seaver and W. Edwards 
1981 "A Comparison of Weight Approximation Techniques in 
Multiattribute Utility Decision Making." Organizational Behavior 


and Human Performance. Vol. 28, pp. 62-77. 


Tucker L.R.A, 
1964 "A Suggested Alternative Formulation in the Developments by 


Hursch, Hammond and Hursch, and by Hammond, Hursch and Todd." 


: Psychological Review, Vol. 71, pp. 528-530. 


Wonnacott R.J. and T.H. Wonnacott 


1979 Econometrics.° Second edition. New York, N.Y.: John Wiley and 
Sons. 


A ma 


ei 5 


4 . , 
_ Tablet 


Variations in Judgmental Policies (Cue utilization and beta weights) According to 


ra . 
three Decieton-Rules for Model Building 


Summary Deta : A B c . # f ~ - aa 

Average number of cues in equations 5.3 7.0 4.7 . . : ? « . 

average 2? 632.657. 685 , : 

Average Adjusted Rn? 40P-.62% 805 2° ; 
: : ] 


(a) een; (bd) age; (c) ethnic origin; (4) 1.Q.; (e) high school graduation grade; 

: “S (1) socte-economic beckgrouné; (g) esritel status; (h) health; (1) echievesent expectations: 
(4) nature of relations with high school] teschers; (kh) time spent deing homework during last 
yeag of high echool; (1) fear of failure; ® living expenses errangesents; (n) politics)” 
ectivities; (0) sccial connections with university eteff members; (p) sociability. 

- 

Decision rules for aodel building: by inclusion in the equation of (A) the verisbles with « 

significant beta weight at che .05 level or bets aly; (B) all the verisbles vaich contribute 

et least 1% of explained violencd to the equetion; (C) all the vatiables which contribute . 


any wessurable ancust of explsined varience to the equation. 


; 
» 


See text for details of sdjustuent. 


. 
. 


teiiniassamccigabinail ES | é 


Cp) 


> ae ahi Tale 2 


Subjects’ Judgmental Policies as Reflected in the object samples 


split into two Sequential Subsamples - . é : 
~ Dire esata tes pereen 
« ‘ 
2 . p . ' E t 
i ‘ 8 
Cues “ be} 
] use ww 
% re ‘ 3 be 3 ’ 
- an a rs] o 
be € 4 3 wi 
i OR RO a ete a ee is 
76 9 4 
89 3 
66 9 
67 6 ss 
79 5) 
69" OS 
760 1) 
79 4 
76 6 
78 8 
73. ~ 10 
-U 7 
70 10 
55 ? 
52 8 
+29 10 
60 10 
ey 
.25 5 * 
.61 10 
69 8 
63 6 
: 88 8 
«79 8 
77 TH 


Nature of Sample: 


Ir I+It 


Average nr? 
Average Adjusted Rr? 
Average number of cues in equations 


* See key tw-Table 1 


* 1, Sequential profiles 1-36; II, Sequential profiles 39-72; B, whole sample of 72 (Taken from Table 1) 


TABLE 9 


Summary Date of Subjects’ Judgeental Policies 
As Reflected in 1) The Randomly Split Object Samples 


2) The Sequentially Split Object Samples 3) The whole 
Object Semple (All the equations in accordance with 


modeling decision-rule B in Table !) 


[¢9) (¢69) 
Average in Randosly Average in Sequentially 
Split Subsamples 3 les 


v 


(TIT) 
Value ia whole 


TABLE 4 


ws x 
Values of R2's for Selected Values of 


\ R, ane combinations of k and n. 


Table 5 


. : 
Relationship Between Actual Judgments And Self-Insights About the Cues 
And the Clusters Used in Making These Judgments ~ 


° 


1 = Actual Judgments; 2 = Judgments Predicted From Self- ~Insight About 
’ The Individual Cues Used And Their Weights; 3 = Judgments Predicted 
From Self-Insight About Inferred Clusters And Their Weights; 4 = 
Judgments Predicted From Self-Insight About Individual Cues Used (un- 
weighted); 5 = Judgments Predicted From Self- -Insight About Inferred 
Clusters (unweighted). 


i} 
(IV) (Vv) (VI) (VII) | (VITI) 
i 
R2 R2 \ ! 
; 1,4 1,5 ' 
Sy (Squared Corre- (Squared Corre- Direction | (Squared Corre- (Squared Corre- | Direction Incygase on 
-) lation between lation between of dif- lation between lation between of dif-— in R oaratand 
; : actual judgments actual judgments | ference actual judgments {| actuab judgments | ference | obtained ae | 
n & judgments pre= & judgments pre- | between & judgments pre- | & judgments pre- | between | from ees 
‘ dicted from self- | dicted from self- | columns dicted from self- | dicted from self- columns | self- we cake | 
\ insight about in- | insight about in- | II-I insight about insight about V-IV insight ne : | 
\ dividual cues %& ferred clusters & unweighted indi- | unweighted about he 
| their weights) their weights) vidual cues) clusters) eit jukereace| 
pub ject utiliza- 
tion j 
‘a ’ : | 
i 
{ 
f 4 .38 - .00 02. |; 
5 39 ie -00 .O1 
a 34 - -00 -90 
8 61 | = .00 -O1 
| 
2 ( 3 
* Decreases in R are reported as zero gain. 
= \ ¥ 
+ 
nw? 


° 
aA 


TABLE 6 | 


Number of cues and concepts used in Judgments 


‘ 


(Data from Self-Insight) 


Number of Number of Number of Cues In 
Cues Reported Clusters Clusters: 
Having Been Identified 


Counted| Counted as 


